Skip to main content
Article thumbnail
Location of Repository

Optimize first, buy later: analyzing metrics to ramp-up very large knowledge bases

By Paea Lependu, Natalya F. Noy, Clement Jonquet, Paul R. Alex, Nigam H. Shah and Mark A. Musen

Abstract

Abstract. As Linked Open Data moves into the landscape of larger and larger ontologies having terabytes of related data, we must work on optimizing the performance of our tools. We are easily tempted to buy bigger machines or roomfuls of little ones to address the scalability problem. Yet, careful analysis and evaluation of the shape and characteristics of our data—using metrics—often leads to dramatic improvements in performance. Indeed, it is important to consider the size and distribution of data within a hierarchy, but the size and depth of ontologies themselves (some as large as 500,000 classes) needs to play a more prominent role in benchmarking and evaluation. Therefore, we have synthesized a set of representative ontologies which yield additional insight into load-time costs during benchmarking analysis. By further analyzing instance density and ontology evolution metrics, we have reduced the population time (including materialization of the transitive closure) for the NCBO Resource Index—a database of 16.4 billion annotations linking 2.4 million ontology terms to 3.5 million data elements—from one week to less than one hour on the same machine.

Year: 2013
OAI identifier: oai:CiteSeerX.psu:10.1.1.352.9976
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.stanford.edu/~plepe... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.