Abstract. As Linked Open Data moves into the landscape of larger and larger ontologies having terabytes of related data, we must work on optimizing the performance of our tools. We are easily tempted to buy bigger machines or roomfuls of little ones to address the scalability problem. Yet, careful analysis and evaluation of the shape and characteristics of our data—using metrics—often leads to dramatic improvements in performance. Indeed, it is important to consider the size and distribution of data within a hierarchy, but the size and depth of ontologies themselves (some as large as 500,000 classes) needs to play a more prominent role in benchmarking and evaluation. Therefore, we have synthesized a set of representative ontologies which yield additional insight into load-time costs during benchmarking analysis. By further analyzing instance density and ontology evolution metrics, we have reduced the population time (including materialization of the transitive closure) for the NCBO Resource Index—a database of 16.4 billion annotations linking 2.4 million ontology terms to 3.5 million data elements—from one week to less than one hour on the same machine.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.