Search CORE

PubChem3D: Shape compatibility filtering using molecular shape quadrupoles

Author: Bolton Evan E
Bryant Stephen H
Kim Sunghwan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background PubChem provides a 3-D neighboring relationship, which involves finding the maximal shape overlap between two static compound 3-D conformations, a computationally intensive step. It is highly desirable to avoid this overlap computation, especially if it can be determined with certainty that a conformer pair cannot meet the criteria to be a 3-D neighbor. As such, PubChem employs a series of pre-filters, based on the concept of volume, to remove approximately 65% of all conformer neighbor pairs prior to shape overlap optimization. Given that molecular volume, a somewhat vague concept, is rather effective, it leads one to wonder: can the existing PubChem 3-D neighboring relationship, which consists of billions of shape similar conformer pairs from tens of millions of unique small molecules, be used to identify additional shape descriptor relationships? Or, put more specifically, can one place an upper bound on shape similarity using other "fuzzy" shape-like concepts like length, width, and height? Results Using a basis set of 4.18 billion 3-D neighbor pairs identified from single conformer per compound neighboring of 17.1 million molecules, shape descriptors were computed for all conformers. These steric shape descriptors included several forms of molecular volume and shape quadrupoles, which essentially embody the length, width, and height of a conformer. For a given 3-D neighbor conformer pair, the volume and each quadrupole component (Qx, Qy, and Qz) were binned and their frequency of occurrence was examined. Per molecular volume type, this effectively produced three different maps, one per quadrupole component (Qx, Qy, and Qz), of allowed values for the similarity metric, shape Tanimoto (ST) ≥ 0.8. The efficiency of these relationships (in terms of true positive, true negative, false positive and false negative) as a function of ST threshold was determined in a test run of 13.2 billion conformer pairs not previously considered by the 3-D neighbor set. At an ST ≥ 0.8, a filtering efficiency of 40.4% of true negatives was achieved with only 32 false negatives out of 24 million true positives, when applying the separate Qx, Qy, and Qz maps in a series (Qxyz). This efficiency increased linearly as a function of ST threshold in the range 0.8-0.99. The Qx filter was consistently the most efficient followed by Qy and then by Qz. Use of a monopole volume showed the best overall performance, followed by the self-overlap volume and then by the analytic volume. Application of the monopole-based Qxyz filter in a "real world" test of 3-D neighboring of 4,218 chemicals of biomedical interest against 26.1 million molecules in PubChem reduced the total CPU cost of neighboring by between 24-38% and, if used as the initial filter, removed from consideration 48.3% of all conformer pairs at almost negligible computational overhead. Conclusion Basic shape descriptors, such as those embodied by size, length, width, and height, can be highly effective in identifying shape incompatible compound conformer pairs. When performing a 3-D search using a shape similarity cut-off, computation can be avoided by identifying conformer pairs that cannot meet the result criteria. Applying this methodology as a filter for PubChem 3-D neighboring computation, an improvement of 31% was realized, increasing the average conformer pair throughput from 154,000 to 202,000 per second per CPU core.</p

Improving protein structure similarity searches using domain boundaries based on conserved sequence information

Author: Bryant Stephen H
Madej Tom
Thompson Kenneth Evan
Wang Yanli
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The identification of protein domains plays an important role in protein structure comparison. Domain query size and composition are critical to structure similarity search algorithms such as the Vector Alignment Search Tool (VAST), the method employed for computing related protein structures in NCBI Entrez system. Currently, domains identified on the basis of structural compactness are used for VAST computations. In this study, we have investigated how alternative definitions of domains derived from conserved sequence alignments in the Conserved Domain Database (CDD) would affect the domain comparisons and structure similarity search performance of VAST. Results Alternative domains, which have significantly different secondary structure composition from those based on structurally compact units, were identified based on the alignment footprints of curated protein sequence domain families. Our analysis indicates that domain boundaries disagree on roughly 8% of protein chains in the medium redundancy subset of the Molecular Modeling Database (MMDB). These conflicting sequence based domain boundaries perform slightly better than structure domains in structure similarity searches, and there are interesting cases when structure similarity search performance is markedly improved. Conclusion Structure similarity searches using domain boundaries based on conserved sequence information can provide an additional method for investigators to identify interesting similarities between proteins with known structures. Because of the improvement in performance of structure similarity searches using sequence domain boundaries, we are in the process of implementing their inclusion into the VAST search and MMDB resources in the NCBI Entrez system.</p

Crossref

Predicting drug–drug interactions through drug structural similarities and interaction networks incorporating pharmacokinetics and pharmacodynamics knowledge

Author: Ming Hao
Stephen H. Bryant
Takako Takeda
Tiejun Cheng
Yanli Wang
Publication venue: Springer Nature
Publication date: 01/01/2017
Field of study

Additional file 1. Table S1. Average structural similarity scores for the DDI/non–DDI pairs in the network of each De. Table S2-1. Top 10 predicted drugs with DDIs for warfarin. Table S2-2. Top 10 predicted drugs with DDIs for simvastatin. Table S3. Four-fold cross-validation test results. Text S1. Drugs that show DDI (DrugBank ID). Figure S1. Illustration of construction of training and test set for 4-fold cross validation. Figure S2. ROC curves using the models with score set 1 in a 4-fold validation

eScholarship - University of California

FigShare

Recommended from our members

Phase 2 trial of montelukast for prevention of pain in sickle cell disease.

Author: Brandow Amanda
Bryant Valencia
DeBaun Michael R
Embury Stephen H
Field Joshua J
Kassim Adetola
Matsui Neil
Simpson Pippa
Wilkerson Karina
Zhang Liyun
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Cysteinyl leukotrienes (CysLTs) are lipid mediators of inflammation. In patients with sickle cell disease (SCD), levels of CysLTs are increased compared with controls and associated with a higher rate of hospitalization for pain. We tested the hypothesis that administration of the CysLT receptor antagonist montelukast would improve SCD-related comorbidities, including pain, in adolescents and adults with SCD. In a phase 2 randomized trial, we administered montelukast or placebo for 8 weeks. The primary outcome measure was a >30% reduction in soluble vascular cell adhesion molecule 1 (sVCAM), a marker of vascular injury. Secondary outcome measures were reduction in daily pain, improvement in pulmonary function, and improvement in microvascular blood flow, as measured by laser Doppler velocimetry. Forty-two participants with SCD were randomized to receive montelukast or placebo for 8 weeks. We found no difference between the montelukast and placebo groups with regard to the levels of sVCAM, reported pain, pulmonary function, or microvascular blood flow. Although montelukast is an effective treatment for asthma, we did not find benefit for SCD-related outcomes. This clinical trial was registered at www.clinicaltrials.gov as #NCT01960413

Knowledge-based annotation of small molecule binding sites in proteins

Author: Bryant Stephen H
Madej Thomas
Panchenko Anna R
Shoemaker Benjamin A
Thangudu Ratna R
Tyagi Manoj
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity. Results We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones. Conclusions A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at <url>http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi</url>.</p

Crossref

Refining multiple sequence alignments with conserved core regions

Author: Bryant Stephen H.
Chakrabarti Saikat
Lanczycki Christopher J.
Panchenko Anna R.
Przytycka Teresa M.
Thiessen Paul A.
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution () and will be incorporated into the next release of the Cn3D structure/alignment viewer

CiteSeerX

A Case of Infectious Purpura Fulminans: An Unusual Organism and Method of Diagnosis

Author: Bryant Catherine
Eng Pei Chia
Jackson Stephen H. D.
Publication venue: 'SMC Media'
Publication date: 01/01/2014
Field of study

Infectious purpura fulminans is a rapidly progressive skin necrosis that carries a mortality rate of 30%. Here, we described a case of infectious purpura fulminans caused by Capnocytophaga diagnosed by a blood film