285 research outputs found

    PubChem3D: a new resource for scientists

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>PubChem is an open repository for small molecules and their experimental biological activity. PubChem integrates and provides search, retrieval, visualization, analysis, and programmatic access tools in an effort to maximize the utility of contributed information. There are many diverse chemical structures with similar biological efficacies against targets available in PubChem that are difficult to interrelate using traditional 2-D similarity methods. A new layer called PubChem3D is added to PubChem to assist in this analysis.</p> <p>Description</p> <p>PubChem generates a 3-D conformer model description for 92.3% of all records in the PubChem Compound database (when considering the parent compound of salts). Each of these conformer models is sampled to remove redundancy, guaranteeing a minimum (non-hydrogen atom pair-wise) RMSD between conformers. A diverse conformer ordering gives a maximal description of the conformational diversity of a molecule when only a subset of available conformers is used. A pre-computed search per compound record gives immediate access to a set of 3-D similar compounds (called "Similar Conformers") in PubChem and their respective superpositions. Systematic augmentation of PubChem resources to include a 3-D layer provides users with new capabilities to search, subset, visualize, analyze, and download data.</p> <p>A series of retrospective studies help to demonstrate important connections between chemical structures and their biological function that are not obvious using 2-D similarity but are readily apparent by 3-D similarity.</p> <p>Conclusions</p> <p>The addition of PubChem3D to the existing contents of PubChem is a considerable achievement, given the scope, scale, and the fact that the resource is publicly accessible and free. With the ability to uncover latent structure-activity relationships of chemical structures, while complementing 2-D similarity analysis approaches, PubChem3D represents a new resource for scientists to exploit when exploring the biological annotations in PubChem.</p

    PubChem3D: Similar conformers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>PubChem is a free and open public resource for the biological activities of small molecules. With many tens of millions of both chemical structures and biological test results, PubChem is a sizeable system with an uneven degree of available information. Some chemical structures in PubChem include a great deal of biological annotation, while others have little to none. To help users, PubChem pre-computes "neighboring" relationships to relate similar chemical structures, which may have similar biological function. In this work, we introduce a "Similar Conformers" neighboring relationship to identify compounds with similar 3-D shape and similar 3-D orientation of functional groups typically used to define pharmacophore features.</p> <p>Results</p> <p>The first two diverse 3-D conformers of 26.1 million PubChem Compound records were compared to each other, using a shape Tanimoto (ST) of 0.8 or greater and a color Tanimoto (CT) of 0.5 or greater, yielding 8.16 billion conformer neighbor pairs and 6.62 billion compound neighbor pairs, with an average of 253 "Similar Conformers" compound neighbors per compound. Comparing the 3-D neighboring relationship to the corresponding 2-D neighboring relationship ("Similar Compounds") for molecules such as caffeine, aspirin, and morphine, one finds unique sets of related chemical structures, providing additional significant biological annotation. The PubChem 3-D neighboring relationship is also shown to be able to group a set of non-steroidal anti-inflammatory drugs (NSAIDs), despite limited PubChem 2-D similarity.</p> <p>In a study of 4,218 chemical structures of biomedical interest, consisting of many known drugs, using more diverse conformers per compound results in more 3-D compound neighbors per compound; however, the overlap of the compound neighbor lists per conformer also increasingly resemble each other, being 38% identical at three conformers and 68% at ten conformers. Perhaps surprising is that the average count of conformer neighbors per conformer increases rather slowly as a function of diverse conformers considered, with only a 70% increase for a ten times growth in conformers per compound (a 68-fold increase in the conformer pairs considered).</p> <p>Neighboring 3-D conformers on the scale performed, if implemented naively, is an intractable problem using a modest sized compute cluster. Methodology developed in this work relies on a series of filters to prevent performing 3-D superposition optimization, when it can be determined that two conformers cannot possibly be a neighbor. Most filters are based on Tanimoto equation volume constraints, avoiding incompatible conformers; however, others consider preliminary superposition between conformers using reference shapes.</p> <p>Conclusion</p> <p>The "Similar Conformers" 3-D neighboring relationship locates similar small molecules of biological interest that may go unnoticed when using traditional 2-D chemical structure graph-based methods, making it complementary to such methodologies. The computational cost of 3-D similarity methodology on a wide scale, such as PubChem contents, is a considerable issue to overcome. Using a series of efficient filters, an effective throughput rate of more than 150,000 conformers per second per processor core was achieved, more than two orders of magnitude faster than without filtering.</p

    Discovery of Novel Glycogen Synthase Kinase-3beta Inhibitors: Molecular Modeling, Virtual Screening, and Biological Evaluation

    Get PDF
    Glycogen synthase kinase-3 (GSK-3) is a multifunctional serine/threonine protein kinase which is engaged in a variety of signaling pathways, regulating a wide range of cellular processes. Due to its distinct regulation mechanism and unique substrate specificity in the molecular pathogenesis of human diseases, GSK-3 is one of the most attractive therapeutic targets for the unmet treatment of pathologies, including type-II diabetes, cancers, inflammation, and neurodegenerative disease. Recent advances in drug discovery targeting GSK-3 involved extensive computational modeling techniques. Both ligand/structure-based approaches have been well explored to design ATP-competitive inhibitors. Molecular modeling plus dynamics simulations can provide insight into the protein-substrate and protein-protein interactions at substrate binding pocket and C-lobe hydrophobic groove, which will benefit the discovery of non-ATP-competitive inhibitors. To identify structurally novel and diverse compounds that effectively inhibit GSK-3â, we performed virtual screening by implementing a mixed ligand/structure-based approach, which included pharmacophore modeling, diversity analysis, and ensemble docking. The sensitivities of different docking protocols to the induced-fit effects at the ATP-competitive binding pocket of GSK-3â have been explored. An enrichment study was employed to verify the robustness of ensemble docking compared to individual docking in terms of retrieving active compounds from a decoy dataset. A total of 24 structurally diverse compounds obtained from the virtual screening experiment underwent biological validation. The bioassay results shothat 15 out of the 24 hit compounds are indeed GSK-3â inhibitors, and among them, one compound exhibiting sub-micromolar inhibitory activity is a reasonable starting point for further optimization. To further identify structurally novel GSK-3â inhibitors, we performed virtual screening by implementing another mixed ligand-based/structure-based approach, which included quantitative structure-activity relationship (QSAR) analysis and docking prediction. To integrate and analyze complex data sets from multiple experimental sources, we drafted and validated hierarchical QSAR, which adopts a multi-level structure to take data heterogeneity into account. A collection of 728 GSK-3 inhibitors with diverse structural scaffolds were obtained from published papers of 7 research groups based on different experimental protocols. Support vector machines and random forests were implemented with wrapper-based feature selection algorithms in order to construct predictive learning models. The best models for each single group of compounds were then selected, based on both internal and external validation, and used to build the final hierarchical QSAR model. The predictive performance of the hierarchical QSAR model can be demonstrated by an overall R2 of 0.752 for the 141 compounds in the test set. The compounds obtained from the virtual screening experiment underwent biological validation. The bioassay results confirmed that 2 hit compounds are indeed GSK-3â inhibitors exhibiting sub-micromolar inhibitory activity, and therefore validated hierarchical QSAR as an effective approach to be used in virtual screening experiments. We have successfully implemented a variant of supervised learning algorithm, named multiple-instance learning, in order to predict bioactive conformers of a given molecule which are responsible for the observed biological activity. The implementation requires instance-based embedding, and joint feature selection and classification. The goal of the present project is to implement multiple-instance learning in drug activity prediction, and subsequently to identify the bioactive conformers for each molecule. The proposed approach was proven not to suffer from overfitting and to be highly competitive with classical predictive models, so it is very powerful for drug activity prediction. The approach was also validated as a useful method for pursuit of bioactive conformers

    PubChem3D: Diversity of shape

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The shape diversity of 16.4 million biologically relevant molecules from the PubChem Compound database and their 1.46 billion diverse conformers was explored as a function of molecular volume.</p> <p>Results</p> <p>The diversity of shape space was investigated by determining the shape similarity threshold to achieve a maximum on the count of reference shapes per unit of conformer volume. The rate of growth in shape space, as represented by a decreasing shape similarity threshold, was found to be remarkably smooth as a function of volume. There was no apparent correlation between the count of conformers per unit volume and their diversity, meaning that a single reference shape can describe the shape space of many chemical structures. The ability of a volume to describe the shape space of lesser volumes was also examined. It was shown that a given volume was able to describe 40-70% of the shape diversity of lesser volumes, for the majority of the volume range considered in this study.</p> <p>Conclusion</p> <p>The relative growth of shape diversity as a function of volume and shape similarity is surprisingly uniform. Given the distribution of chemicals in PubChem versus what is theoretically synthetically possible, the results from this analysis should be considered a conservative estimate to the true diversity of shape space.</p

    Identification of a Novel Drug Lead That Inhibits HCV Infection and Cell-to-Cell Transmission by Targeting the HCV E2 Glycoprotein

    Get PDF
    Hepatitis C Virus (HCV) infects 200 million individuals worldwide. Although several FDA approved drugs targeting the HCV serine protease and polymerase have shown promising results, there is a need for better drugs that are effective in treating a broader range of HCV genotypes and subtypes without being used in combination with interferon and/or ribavirin. Recently, two crystal structures of the core of the HCV E2 protein (E2c) have been determined, providing structural information that can now be used to target the E2 protein and develop drugs that disrupt the early stages of HCV infection by blocking E2’s interaction with different host factors. Using the E2c structure as a template, we have created a structural model of the E2 protein core (residues 421–645) that contains the three amino acid segments that are not present in either structure. Computational docking of a diverse library of 1,715 small molecules to this model led to the identification of a set of 34 ligands predicted to bind near conserved amino acid residues involved in the HCV E2: CD81 interaction. Surface plasmon resonance detection was used to screen the ligand set for binding to recombinant E2 protein, and the best binders were subsequently tested to identify compounds that inhibit the infection of Huh-7 cells by HCV. One compound, 281816, blocked E2 binding to CD81 and inhibited HCV infection in a genotype-independent manner with IC50’s ranging from 2.2 µM to 4.6 µM. 281816 blocked the early and late steps of cell-free HCV entry and also abrogated the cell-to-cell transmission of HCV. Collectively the results obtained with this new structural model of E2c suggest the development of small molecule inhibitors such as 281816 that target E2 and disrupt its interaction with CD81 may provide a new paradigm for HCV treatment

    Study of ligand-based virtual screening tools in computer-aided drug design

    Get PDF
    Virtual screening is a central technique in drug discovery today. Millions of molecules can be tested in silico with the aim to only select the most promising and test them experimentally. The topic of this thesis is ligand-based virtual screening tools which take existing active molecules as starting point for finding new drug candidates. One goal of this thesis was to build a model that gives the probability that two molecules are biologically similar as function of one or more chemical similarity scores. Another important goal was to evaluate how well different ligand-based virtual screening tools are able to distinguish active molecules from inactives. One more criterion set for the virtual screening tools was their applicability in scaffold-hopping, i.e. finding new active chemotypes. In the first part of the work, a link was defined between the abstract chemical similarity score given by a screening tool and the probability that the two molecules are biologically similar. These results help to decide objectively which virtual screening hits to test experimentally. The work also resulted in a new type of data fusion method when using two or more tools. In the second part, five ligand-based virtual screening tools were evaluated and their performance was found to be generally poor. Three reasons for this were proposed: false negatives in the benchmark sets, active molecules that do not share the binding mode, and activity cliffs. In the third part of the study, a novel visualization and quantification method is presented for evaluation of the scaffold-hopping ability of virtual screening tools.Siirretty Doriast

    Modeling Chemical Interaction Profiles: II. Molecular Docking, Spectral Data-Activity Relationship, and Structure-Activity Relationship Models for Potent and Weak Inhibitors of Cytochrome P450 CYP3A4 Isozyme

    Get PDF
    Polypharmacy increasingly has become a topic of public health concern, particularly as the U.S. population ages. Drug labels often contain insufficient information to enable the clinician to safely use multiple drugs. Because many of the drugs are bio-transformed by cytochrome P450 (CYP) enzymes, inhibition of CYP activity has long been associated with potentially adverse health effects. In an attempt to reduce the uncertainty pertaining to CYP-mediated drug-drug/chemical interactions, an interagency collaborative group developed a consensus approach to prioritizing information concerning CYP inhibition. The consensus involved computational molecular docking, spectral data-activity relationship (SDAR), and structure-activity relationship (SAR) models that addressed the clinical potency of CYP inhibition. The models were built upon chemicals that were categorized as either potent or weak inhibitors of the CYP3A4 isozyme. The categorization was carried out using information from clinical trials because currently available in vitro high-throughput screening data were not fully representative of the in vivo potency of inhibition. During categorization it was found that compounds, which break the Lipinski rule of five by molecular weight, were about twice more likely to be inhibitors of CYP3A4 compared to those, which obey the rule. Similarly, among inhibitors that break the rule, potent inhibitors were 2–3 times more frequent. The molecular docking classification relied on logistic regression, by which the docking scores from different docking algorithms, CYP3A4 three-dimensional structures, and binding sites on them were combined in a unified probabilistic model. The SDAR models employed a multiple linear regression approach applied to binned 1D 13C-NMR and 1D 15N-NMR spectral descriptors. Structure-based and physical-chemical descriptors were used as the basis for developing SAR models by the decision forest method. Thirty-three potent inhibitors and 88 weak inhibitors of CYP3A4 were used to train the models. Using these models, a synthetic majority rules consensus classifier was implemented, while the confidence of estimation was assigned following the percent agreement strategy. The classifier was applied to a testing set of 120 inhibitors not included in the development of the models. Five compounds of the test set, including known strong inhibitors dalfopristin and tioconazole, were classified as probable potent inhibitors of CYP3A4. Other known strong inhibitors, such as lopinavir, oltipraz, quercetin, raloxifene, and troglitazone, were among 18 compounds classified as plausible potent inhibitors of CYP3A4. The consensus estimation of inhibition potency is expected to aid in the nomination of pharmaceuticals, dietary supplements, environmental pollutants, and occupational and other chemicals for in-depth evaluation of the CYP3A4 inhibitory activity. It may serve also as an estimate of chemical interactions via CYP3A4 metabolic pharmacokinetic pathways occurring through polypharmacy and nutritional and environmental exposures to chemical mixtures

    Discovery of new scaffolds for GABA(A) receptor modulators from natural origin

    Get PDF
    Gamma-aminobutyric acid type A (GABAA) receptors are the major inhibitory neurotransmitter receptors in the central nervous system (CNS). These heteropentameric transmembrane proteins act as chloride ion channel upon activation by the endogenous ligand γ-amino butyric acid (GABA). Until now, 11 distinct GABAA receptor subtypes have been identified in the human brain. They differ in their subunit stoichiometry, tissue localization, functional characteristics, and pharmacological properties. Many CNS depressant drugs, such as the benzodiazepines exert their action via enhancement of the GABAergic neuronal inhibition. However, therapy may be accompanied by unwanted side-effects and specific clinical action is precluded due to the lack of GABAA receptor subtype selectivity. In a preliminary screen the lipophilic extracts of Piper nigrum fruits, Angelica pubescens roots, Acorus calamus roots, Biota orientalis leaves and twigs, and Kadsura longipedunculata fruits had shown positive GABAA receptor modulating activity in an in vitro functional, automated two-microelectrode voltage clamp assay with Xenopus laevis oocytes, which transiently expressed α1β2γ2S GABAA receptors. Aiming at the discovery of new scaffolds which act at the GABAA receptor, the active constituents of these five plant extracts were identified by means of an HPLC-based activity profiling approach. In total, we discovered 28 secondary metabolites with positive GABAA receptor modulating properties belonging to the structural classes of coumarins, monoterpenes, sesquiterpenes, diterpenes, phenylpropanes, piperamides, and lignans. Their structures were elucidated by a combination of powerful analytical methods such as HPLC-PDA-TOF-MS, highly sensitive microprobe NMR, and for chiral compounds, polarimetry and ECD. Determination of relative and absolute configuration was supported by conformational analysis and quantum chemical calculations. Furthermore, three yet unknown natural products could be identified. HPLC-based activity profiling with P. nigrum enabled the identification of 13 structurally related piperamides with minimum amount of extract. This allowed us to draw preliminary structure activity considerations for the scaffold of piperine, which was the main α1β2γ2S GABAA receptor modulator in this plant (EC50: 52.4 ± 9.4 μM, maximal stimulation of GABA induced chloride currents (IGABA) by 302% ± 27%). Sandaracopimaric acid and isopimaric acid from B. orientalis were tested for subtype selectivity at α1�3,5β1-3γ2S subtypes which revealed a comparatively high efficiency of both compounds at α2/3-subunit containing receptors. Additionally, sandaracopimaric acid exerted superior efficiency at receptors comprising β2-subunits. It showed EC50 values from 24.9 ± 6.3 μM to 82.2 ± 46.6 μM, and efficiencies ranging between 502% ± 56% to 1101% ± 98% potentiation of IGABA at the subtypes of investigation. A decrease of locomotor activity in the Open Field behavioral model was observed after intraperitoneal injection of 3 to 30 mg sandaracopimaric acid per kg bodyweight in mice. A trend towards anxiolytic-like activity could be observed with 1 and 3 mg/kg. Further “drug-like” GABAA receptor modulating scaffolds were discovered among the lignans from K. longipedunculata (potencies down to 12.8 ± 3.1 μM and efficiencies up to 886 ± 291% stimulation of IGABA) and among the sesquiterpenes from A. calamus (potencies down to 34.0 ± 6.7 μM and efficiencies up to 886 ± 105% stimulation of IGABA). These substances have potential for the further development as therapeutics acting at the GABAA receptor

    In Silico Design and Selection of CD44 Antagonists:implementation of computational methodologies in drug discovery and design

    Get PDF
    Drug discovery (DD) is a process that aims to identify drug candidates through a thorough evaluation of the biological activity of small molecules or biomolecules. Computational strategies (CS) are now necessary tools for speeding up DD. Chapter 1 describes the use of CS throughout the DD process, from the early stages of drug design to the use of artificial intelligence for the de novo design of therapeutic molecules. Chapter 2 describes an in-silico workflow for identifying potential high-affinity CD44 antagonists, ranging from structural analysis of the target to the analysis of ligand-protein interactions and molecular dynamics (MD). In Chapter 3, we tested the shape-guided algorithm on a dataset of macrocycles, identifying the characteristics that need to be improved for the development of new tools for macrocycle sampling and design. In Chapter 4, we describe a detailed reverse docking protocol for identifying potential 4-hydroxycoumarin (4-HC) targets. The strategy described in this chapter is easily transferable to other compounds and protein datasets for overcoming bottlenecks in molecular docking protocols, particularly reverse docking approaches. Finally, Chapter 5 shows how computational methods and experimental results can be used to repurpose compounds as potential COVID-19 treatments. According to our findings, the HCV drug boceprevir could be clinically tested or used as a lead molecule to develop compounds that target COVID-19 or other coronaviral infections. These chapters, in summary, demonstrate the importance, application, limitations, and future of computational methods in the state-of-the-art drug design process

    Identification of drug leads against HCV and malaria using different target proteins

    Get PDF
    Hepatitis C Virus (HCV) infects 170 million individuals worldwide. Although several newly FDA approved drugs targeting the HCV serine protease and polymerase have shown promising results, there is a need for better drugs that are effective in treating all HCV genotypes and subtypes to be used in an interferon-free regimen. On the other hand, malaria is another public health burden that causes 219 million clinical episodes, and 660,000 deaths per year. In addition, 3.3 billion people live in areas at risk of malaria transmission in 106 countries. It is alarming that 86% of deaths caused by malaria globally were in children. Several challenges are faced when treating malaria, such as resistance against drugs that are used in treatment. This necessitates the development of new classes of drugs to overcome resistance. CD81 is a target protein that plays an essential role in the internalization of HCV into hepatocytes. Thus it was also targeted to identify sets of small molecule ligands predicted to bind to several sites that were identified to be involved in HCV infection. Thirty-six ligands predicted by AutoDock to bind to these sites were tested experimentally to determine if they bound to CD81-LEL. Binding assays conducted using surface Plasmon resonance revealed that 23 out of 36 of the ligands bound in vitro to the recombinant CD81-LEL protein. In an effort to create new drugs that block hepatitis C virus entry into hepatocytes, we have designed and synthesized a small molecule that targets the HCV E2 glycoprotein binding site on CD81. A selective high affinity ligand (SHAL) (11) was created by linking together two small molecules that were predicted by docking and were shown by experimental methods to bind to the same site on CD81 where E2 binds. SH7153 was found to bind to recombinant CD81-LEL with a Kd of 21 µM but wasn’t found to inhibit HCV infection when tested using Raji cells (antibody neutralizing assays) and HCV infection inhibition assays. This led to the conclusion that the linkers’ lengths should be optimized so as to have a SHAL that fits properly in the desired binding sites. The HCV glycoprotein E2 has also been shown to play an essential role in hepatocyte invasion by binding to CD81 and other cell surface receptors. Recently, 2 research groups were able to resolve the core structure of HCV E2 which will largely help providing structural information that can now be used to target the E2 protein and develop drugs that disrupt the early stages of HCV infection by blocking E2’s interaction with different host factors. By targeting conserved E2 residues among different genotypes and subtypes in the CD81 binding site on HCV E2, one might also be able to develop drugs that block HCV infection in a genotype-independent manner. Using the E2c structure as a template, we have used homology modeling methods to develop a structural model of the E2 protein core (residues 421-645) that includes the three amino acid segments that are not present in the E2c structure. Blind docking to this model was then performed using a library of ~4000 small molecules and a set of 40 ligands predicted to bind near conserved amino acid residues involved in the HCV E2: CD81 interaction were selected for experimental testing. Surface Plasmon resonance was used to screen the ligands for binding to recombinant E2 protein and the best binders were subsequently tested to identify compounds that inhibit the infection of hepatocytes by HCV. One compound, 281816, inhibited infection by HCV genotypes 1a, 1b, 2a, 2b, 4a and 6a with IC50’s ranging from 2.2 uM to 4.6 uM. Such inhibitors may represent a new paradigm for HCV treatment. In an attempt to make 281816 more promising, a SHAL prototype was designed using an analogue of 281816 (SH2216). It would be tempting to test the SHAL inhibitory effect and compare it to the 281816’s inhibitory effect. To date, human CD81 (hCD81) is the only human surface protein known to play a role in the process by which sporozoites of several Plasmodium species infect human hepatocytes. Blocking a human receptor that is exploited for the entry process of pathogens has been proven to be a good strategy for fighting drug-resistant mutants. Hence, we targeted the 21 amino acid stretch on CD81 large extracellular loop that was found to be involved in Plasmosium yoleii invasion via virtual screening runs, preliminary binding assays and sporozoite invasion assays. This led to the identification of 4 drug leads that range between moderate and strong inhibitors of infection by Plasmodium yoleii and Plamodium falciparum. Additionally one ligand was found to potentiate the invasion of Plasmodium yoleii
    corecore