Article thumbnail

Data shopping in an open marketplace: Introducing the Ontogrator web application for marking up data using ontologies and browsing using facets

By Norman Morrison, David Hancock, Lynette Hirschman, Peter Dawyndt, Bert Verslyppe, Nikos Kyrpides, Renzo Kottmann, Pelin Yilmaz, Frank Oliver Glöckner, Jeff Grethe, Tim Booth, Peter Sterk, Goran Nenadic and Dawn Field


In the future, we hope to see an open and thriving data market in which users can find and select data from a wide range of data providers. In such an open access market, data are products that must be packaged accordingly. Increasingly, eCommerce sellers present heterogeneous product lines to buyers using faceted browsing. Using this approach we have developed the Ontogrator platform, which allows for rapid retrieval of data in a way that would be familiar to any online shopper. Using Knowledge Organization Systems (KOS), especially ontologies, Ontogrator uses text mining to mark up data and faceted browsing to help users navigate, query and retrieve data. Ontogrator offers the potential to impact scientific research in two major ways: 1) by significantly improving the retrieval of relevant information; and 2) by significantly reducing the time required to compose standard database queries and assemble information for further research. Here we present a pilot implementation developed in collaboration with the Genomic Standards Consortium (GSC) that includes content from the StrainInfo, GOLD, CAMERA, Silva and Pubmed databases. This implementation demonstrates the power of ontogration and highlights that the usefulness of this approach is fully dependent on both the quality of data and the KOS (ontologies) used. Ideally, the use and further expansion of this collaborative system will help to surface issues associated with the underlying quality of annotation and could lead to a systematic means for accessing integrated data resources

Topics: Short Genome Reports
Publisher: Michigan State University
OAI identifier:
Provided by: PubMed Central

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.

Suggested articles


  1. An Open-Source Gazetteer Built on Ontological Principles. Available at
  2. Assisted detection of ontological terms, Available at
  3. (2007). CAMERA: a community resource for metagenomics. PLoS Biol
  4. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet
  5. (2005). Knowledge accumulation and resolution of data inconsistencies during the integration of microbial information sources.
  6. Minimum Information about Anatomy, Available
  7. Nucleic Acids Res 2009; 37(Database issue):D26-D31.
  8. (2009). Omics Data Sharing. Science
  9. (2006). presented at the SIGIR'2006 Workshop on Faceted Search,
  10. (2008). Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol
  11. (2007). SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res
  12. The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources. Nucleic Acids Res 2010; 38(Database issue):D1-D4.
  13. The Environment Ontology. Available at
  14. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2010; 38(Database issue):D346-D354.
  15. The Genomic Standards Consortium. Available at
  16. (2008). The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol
  17. (2007). The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol
  18. The Ontogrator Web application.
  19. uBio Taxonomic Name Server, Available at
  20. (2008). Working together to put molecules on the map. Nature