Article thumbnail
Location of Repository

Automatically generating data linkages using a domainindependent candidate selection approach

By Dezhao Song and Jeff Heflin

Abstract

Abstract. One challenge for Linked Data is scalably establishing high-quality owl:sameAs links between instances (e.g., people, geographical locations, publications, etc.) in different data sources. Traditional ap-proaches to this entity coreference problem do not scale because they exhaustively compare every pair of instances. In this paper, we pro-pose a candidate selection algorithm for pruning the search space for entity coreference. We select candidate instance pairs by computing a character-level similarity on discriminating literal values that are chosen using domain-independent unsupervised learning. We index the instances on the chosen predicates ’ literal values to efficiently look up similar in-stances. We evaluate our approach on two RDF and three structured datasets. We show that the traditional metrics don’t always accurately reflect the relative benefits of candidate selection, and propose additional metrics. We show that our algorithm frequently outperforms alternatives and is able to process 1 million instances in under one hour on a single Sun Workstation. Furthermore, on the RDF datasets, we show that the entire entity coreference process scales well by applying our technique. Surprisingly, this high recall, low precision filtering mechanism frequently leads to higher F-scores in the overall system

Topics: Linked Data, Entity Coreference, Scalability, Candidate Se- lection
Year: 2011
DOI identifier: 10.1007/978-3-642-25073-6_41
OAI identifier: oai:CiteSeerX.psu:10.1.1.487.3988
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://iswc2011.semanticweb.or... (external link)
  • http://iswc2011.semanticweb.or... (external link)
  • http://citeseerx.ist.psu.edu/v... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.