Article thumbnail

The Chemical Information Ontology: Provenance and Disambiguation for Chemical Data on the Biological Semantic Web

By Janna Hastings, Leonid Chepelev, Egon Willighagen, Nico Adams, Christoph Steinbeck and Michel Dumontier


Cheminformatics is the application of informatics techniques to solve chemical problems in silico. There are many areas in biology where cheminformatics plays an important role in computational research, including metabolism, proteomics, and systems biology. One critical aspect in the application of cheminformatics in these fields is the accurate exchange of data, which is increasingly accomplished through the use of ontologies. Ontologies are formal representations of objects and their properties using a logic-based ontology language. Many such ontologies are currently being developed to represent objects across all the domains of science. Ontologies enable the definition, classification, and support for querying objects in a particular domain, enabling intelligent computer applications to be built which support the work of scientists both within the domain of interest and across interrelated neighbouring domains. Modern chemical research relies on computational techniques to filter and organise data to maximise research productivity. The objects which are manipulated in these algorithms and procedures, as well as the algorithms and procedures themselves, enjoy a kind of virtual life within computers. We will call these information entities. Here, we describe our work in developing an ontology of chemical information entities, with a primary focus on data-driven research and the integration of calculated properties (descriptors) of chemical entities within a semantic web context. Our ontology distinguishes algorithmic, or procedural information from declarative, or factual information, and renders of particular importance the annotation of provenance to calculated data. The Chemical Information Ontology is being developed as an open collaborative project. More details, together with a downloadable OWL file, are available at (license: CC-BY-SA)

Topics: Research Article
Publisher: Public Library of Science
OAI identifier:
Provided by: PubMed Central

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.

Suggested articles


  1. (1978). A discussion of the solution for the best rotation to relate two sets of vectors.
  2. (2007). A look inside HIV resistance through retroviral protease interaction maps.
  3. (1976). A solution of the best rotation to relate two sets of vectors.
  4. (2008). Bio2RDF: Towards a mashup to build bioinformatics knowledge systems.
  5. (2009). Bioclipse 2: A scriptable integration platform for the life sciences.
  6. (2004). Biodynamic ontology: Applying BFO in the biomedical domain. In:
  7. (2006). Bringing chemical data onto the semantic web.
  8. (2007). Carbonfate maps for metabolic reactions.
  9. (2009). Challenges predicting ligandreceptor interactions of promiscuous proteins: The nuclear receptor PXR.
  10. (2009). ChEMBL. An interview with John Overington, team leader, chemogenomics at the European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory (EMBL-EBI). Interview by
  11. (2011). Chemical Abstracts Service. Available: Accessed:
  12. (2011). Chemical descriptors from QSARWorld - a strand life sciences web resource. Available: Accessed:
  13. (2010). Chemical Entities of Biological Interest: an update.
  14. (2011). Chemical entity semantic specification: Knowledge representation for efficient semantic cheminformatics and facile data integration.
  15. (1992). Chemical graph theory.
  16. (1988). Chemical inference. 3. Formalization of the language of relational chemistry: ontology and algebra.
  17. (2008). Chemical knowledge for the semantic web. In:
  18. (2011). Collaborative protege. Available: http://protegewiki. Protege. Accessed:
  19. (2006). Correcting ligands, metabolites, and pathways.
  20. (2011). CTfile formats, including MOLfile.
  21. (2010). Designing focused chemical libraries enriched in protein-protein interaction inhibitors using machine-learning methods.
  22. (2001). Development of Chemical Markup Language (CML) as a system for handling complex chemical content.
  23. (2006). Drugbank: A comprehensive resource for in silico drug discovery and exploration.
  24. (2008). Evolutionarily conserved substrate substructures for automated annotation of enzyme superfamilies.
  25. (1997). Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings.
  26. (2008). Expert Systems in Chemistry Research. Boca
  27. (2006). General Formal Ontology (GFO) - A Foundational Ontology Integrating Objects and Processes [Version 1.0].
  28. (2003). History of quantitative structure-activity relationships.
  29. (2009). How large is the metabolome? A critical analysis of data exchange practices in chemistry.
  30. (2003). In silico prediction of drug toxicity.
  31. (2011). Linked open drug data for pharmaceutical research and development.
  32. (2010). Linking open drug data to cheminformatics and proteochemometrics. In:
  33. (2011). Linking the Resource Description Framework to cheminformatics and proteochemometrics.
  34. (2010). Modeling biomedical experimental processes with OBI.
  35. (2011). OEChem. Available: http://www. Accessed:
  36. (2009). OWL 2 web ontology language manchester syntax. Available: Accessed:
  37. (2008). OWL 2: The next step for OWL.
  38. (2007). OWL: a Description Logic Based Ontology Language for the Semantic Web. In:
  39. (2007). Pellet: A practical OWL-DL reasoner.
  40. (2011). Phenotypic Quality Ontology. Available: index.php/PATO:Main Page. Accessed:
  41. (1994). Predicting physical properties from molecular structure.
  42. (2005). PubChem: An entrez database of small molecules.
  43. (2010). Putting biomedical ontologies to work.
  44. (1997). QSPR as a means of predicting and understanding chemical and physical properties in terms of structure.
  45. (2010). Quantitative correlation of physical and chemical properties with chemical structure: Utility for prediction.
  46. (2006). Recent developments of the Chemistry Development Kit (CDK) - an open-source Java library for chemo- and bioinformatics.
  47. (2005). Relations in biomedical ontologies.
  48. (2010). Scientific realism.
  49. (2011). Semantic chemistry working group. Available: Accessed:
  50. (2007). Seven golden rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry.
  51. (1988). SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules.
  52. (2007). SPARQL-DL: SPARQL query for OWL-DL. In:
  53. (2002). Sweetening ontologies with dolce. In: EKAW.
  54. (1998). The basic tools of formal ontology. In: Formal Ontology in Information Systems.
  55. (2006). The Blue Obelisk – interoperability in chemical informatics.
  56. (2011). The Blue Obelisk Descriptor Ontology. Available: index.xhtml. Accessed:
  57. (2011). The CHEMINF ontology, OWL version. Available: svn/trunk/ontology/cheminf.owl. Accessed:
  58. (2004). The cornucopia of formal ontological relations.
  59. The Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology.
  60. (2011). The Gene Ontology Consortium (2011) The OBO language, version 1.2. Available: 2.shtml. Accessed:
  61. (2011). The IAO Community (2011) The Information Artifact Ontology. Available: Accessed:
  62. (2011). The Novartis homepage. Available: Accessed
  63. (2007). The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.
  64. (2011). The Open Babel Project
  65. (2002). The present utility and future potential for medicinal chemistry of QSAR/ QSPR with whole molecule descriptors. Current Topics in
  66. (2011). The Prote ´ge ´ ontology editing tool. Available: Accessed:
  67. (2011). The Roche homepage. Available: Accessed:
  68. (2006). The Semantic Grid and chemistry: experiences with CombeChem.
  69. (2001). The Semantic Web. Available: Accessed
  70. (2005). The sequence ontology: a tool for the unification of genome annotations.
  71. (2011). The Web Ontology Language. Available: Accessed:
  72. (2010). Towards interoperable and reproducible QSAR analyses: Exchange of datasets.
  73. (2010). What are chemical structures and their relations? In:
  74. (2011). What’s in an ‘is about’ link? Chemical diagrams and the IAO. In: