131 research outputs found
Tautomerism in large databases
We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection
In silico assessment of potential druggable pockets on the surface of α1-Antitrypsin conformers
The search for druggable pockets on the surface of a protein is often performed on a single conformer, treated as a rigid body. Transient druggable pockets may be missed in this approach. Here, we describe a methodology for systematic in silico analysis of surface clefts across multiple conformers of the metastable protein α1-antitrypsin (A1AT). Pathological mutations disturb the conformational landscape of A1AT, triggering polymerisation that leads to emphysema and hepatic cirrhosis. Computational screens for small molecule inhibitors of polymerisation have generally focused on one major druggable site visible in all crystal structures of native A1AT. In an alternative approach, we scan all surface clefts observed in crystal structures of A1AT and in 100 computationally produced conformers, mimicking the native solution ensemble. We assess the persistence, variability and druggability of these pockets. Finally, we employ molecular docking using publicly available libraries of small molecules to explore scaffold preferences for each site. Our approach identifies a number of novel target sites for drug design. In particular one transient site shows favourable characteristics for druggability due to high enclosure and hydrophobicity. Hits against this and other druggable sites achieve docking scores corresponding to a Kd in the µM–nM range, comparing favourably with a recently identified promising lead. Preliminary ThermoFluor studies support the docking predictions. In conclusion, our strategy shows considerable promise compared with the conventional single pocket/single conformer approach to in silico screening. Our best-scoring ligands warrant further experimental investigation
Superimposé: a 3D structural superposition server
The Superimposé webserver performs structural similarity searches with a preference towards 3D structure-based methods. Similarities can be detected between small molecules (e.g. drugs), parts of large structures (e.g. binding sites of proteins) and entire proteins. For this purpose, a number of algorithms were implemented and various databases are provided. Superimposé assists the user regarding the selection of a suitable combination of algorithm and database. After the computation on our server infrastructure, a visual assessment of the results is provided. The structure-based in silico screening for similar drug-like compounds enables the detection of scaffold-hoppers with putatively similar effects. The possibility to find similar binding sites can be of special interest in the functional analysis of proteins. The search for structurally similar proteins allows the detection of similar folds with different backbone topology. The Superimposé server is available at: http://bioinformatics.charite.de/superimpose
Functional Group and Substructure Searching as a Tool in Metabolomics
BACKGROUND: A direct link between the names and structures of compounds and the functional groups contained within them is important, not only because biochemists frequently rely on literature that uses a free-text format to describe functional groups, but also because metabolic models depend upon the connections between enzymes and substrates being known and appropriately stored in databases. METHODOLOGY: We have developed a database named "Biochemical Substructure Search Catalogue" (BiSSCat), which contains 489 functional groups, >200,000 compounds and >1,000,000 different computationally constructed substructures, to allow identification of chemical compounds of biological interest. CONCLUSIONS: This database and its associated web-based search program (http://bisscat.org/) can be used to find compounds containing selected combinations of substructures and functional groups. It can be used to determine possible additional substrates for known enzymes and for putative enzymes found in genome projects. Its applications to enzyme inhibitor design are also discussed
Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry
BACKGROUND: Structure elucidation of unknown small molecules by mass spectrometry is a challenge despite advances in instrumentation. The first crucial step is to obtain correct elemental compositions. In order to automatically constrain the thousands of possible candidate structures, rules need to be developed to select the most likely and chemically correct molecular formulas. RESULTS: An algorithm for filtering molecular formulas is derived from seven heuristic rules: (1) restrictions for the number of elements, (2) LEWIS and SENIOR chemical rules, (3) isotopic patterns, (4) hydrogen/carbon ratios, (5) element ratio of nitrogen, oxygen, phosphor, and sulphur versus carbon, (6) element ratio probabilities and (7) presence of trimethylsilylated compounds. Formulas are ranked according to their isotopic patterns and subsequently constrained by presence in public chemical databases. The seven rules were developed on 68,237 existing molecular formulas and were validated in four experiments. First, 432,968 formulas covering five million PubChem database entries were checked for consistency. Only 0.6% of these compounds did not pass all rules. Next, the rules were shown to effectively reducing the complement all eight billion theoretically possible C, H, N, S, O, P-formulas up to 2000 Da to only 623 million most probable elemental compositions. Thirdly 6,000 pharmaceutical, toxic and natural compounds were selected from DrugBank, TSCA and DNP databases. The correct formulas were retrieved as top hit at 80–99% probability when assuming data acquisition with complete resolution of unique compounds and 5% absolute isotope ratio deviation and 3 ppm mass accuracy. Last, some exemplary compounds were analyzed by Fourier transform ion cyclotron resonance mass spectrometry and by gas chromatography-time of flight mass spectrometry. In each case, the correct formula was ranked as top hit when combining the seven rules with database queries. CONCLUSION: The seven rules enable an automatic exclusion of molecular formulas which are either wrong or which contain unlikely high or low number of elements. The correct molecular formula is assigned with a probability of 98% if the formula exists in a compound database. For truly novel compounds that are not present in databases, the correct formula is found in the first three hits with a probability of 65–81%. Corresponding software and supplemental data are available for downloads from the authors' website
Structure-based classification and ontology in chemistry
<p>Abstract</p> <p>Background</p> <p>Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving <it>relevant </it>results from the available information, and <it>organising </it>those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures), while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies.</p> <p>Results</p> <p>We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches.</p> <p>Conclusion</p> <p>Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational utilities including algorithmic, statistical and logic-based tools. For the task of automatic structure-based classification of chemical entities, essential to managing the vast swathes of chemical data being brought online, systems which are capable of hybrid reasoning combining several different approaches are crucial. We provide a thorough review of the available tools and methodologies, and identify areas of open research.</p
Identification of Anti-Malarial Compounds as Novel Antagonists to Chemokine Receptor CXCR4 in Pancreatic Cancer Cells
Despite recent advances in targeted therapies, patients with pancreatic adenocarcinoma continue to have poor survival highlighting the urgency to identify novel therapeutic targets. Our previous investigations have implicated chemokine receptor CXCR4 and its selective ligand CXCL12 in the pathogenesis and progression of pancreatic intraepithelial neoplasia and invasive pancreatic cancer; hence, CXCR4 is a promising target for suppression of pancreatic cancer growth. Here, we combined in silico structural modeling of CXCR4 to screen for candidate anti-CXCR4 compounds with in vitro cell line assays and identified NSC56612 from the National Cancer Institute's (NCI) Open Chemical Repository Collection as an inhibitor of activated CXCR4. Next, we identified that NSC56612 is structurally similar to the established anti-malarial drugs chloroquine and hydroxychloroquine. We evaluated these compounds in pancreatic cancer cells in vitro and observed specific antagonism of CXCR4-mediated signaling and cell proliferation. Recent in vivo therapeutic applications of chloroquine in pancreatic cancer mouse models have demonstrated decreased tumor growth and improved survival. Our results thus provide a molecular target and basis for further evaluation of chloroquine and hydroxychloroquine in pancreatic cancer. Historically safe in humans, chloroquine and hydroxychloroquine appear to be promising agents to safely and effectively target CXCR4 in patients with pancreatic cancer
A taxonomic backbone for the global synthesis of species diversity in the angiosperm order Caryophyllales
The Caryophyllales constitute a major lineage of flowering plants with approximately 12500 species in 39 families. A taxonomic backbone at the genus level is provided that reflects the current state of knowledge and accepts 749 genera for the order. A detailed review of the literature of the past two decades shows that enormous progress has been made in understanding overall phylogenetic relationships in Caryophyllales. The process of re-circumscribing families in order to be monophyletic appears to be largely complete and has led to the recognition of eight new families (Anacampserotaceae, Kewaceae, Limeaceae, Lophiocarpaceae, Macarthuriaceae, Microteaceae, Montiaceae and Talinaceae), while the phylogenetic evaluation of generic concepts is still well underway. As a result of this, the number of genera has increased by more than ten percent in comparison to the last complete treatments in the Families and genera of vascular plants” series. A checklist with all currently accepted genus names in Caryophyllales, as well as nomenclatural references, type names and synonymy is presented. Notes indicate how extensively the respective genera have been studied in a phylogenetic context. The most diverse families at the generic level are Cactaceae and Aizoaceae, but 28 families comprise only one to six genera. This synopsis represents a first step towards the aim of creating a global synthesis of the species diversity in the angiosperm order Caryophyllales integrating the work of numerous specialists around the world
- …