4 research outputs found
Simplifying Entity Resolution on Web Data with Schema-agnostic, Non-iterative Matching
International audienceEntity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of descriptions published in the Web of Data. To address them, we propose the MinoanER framework that fulfills full automation and support of highly heterogeneous entities. MinoanER leverages a token-based similarity of entities to define a new metric that derives the similarity of neighboring entities from the most important relations, indicated only by statistics. For high efficiency, similarities are computed from a set of schema-agnostic blocks and processed in a non-iterative way that involves four threshold-free heuristics. We demonstrate that the effectiveness of MinoanER is comparable to existing ER tools over real KBs exhibiting low heterogeneity in terms of entity types and content. Yet, MinoanER outperforms state-of-the-art ER tools when matching highly heterogeneous KBs
MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities
Entity Resolution (ER) aims to identify different descriptions in various
Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the
Variety, Volume and Veracity of entity descriptions published in the Web of
Data. To address them, we propose the MinoanER framework that simultaneously
fulfills full automation, support of highly heterogeneous entities, and massive
parallelization of the ER process. MinoanER leverages a token-based similarity
of entities to define a new metric that derives the similarity of neighboring
entities from the most important relations, as they are indicated only by
statistics. A composite blocking method is employed to capture different
sources of matching evidence from the content, neighbors, or names of entities.
The search space of candidate pairs for comparison is compactly abstracted by a
novel disjunctive blocking graph and processed by a non-iterative, massively
parallel matching algorithm that consists of four generic, schema-agnostic
matching rules that are quite robust with respect to their internal
configuration. We demonstrate that the effectiveness of MinoanER is comparable
to existing ER tools over real KBs exhibiting low Variety, but it outperforms
them significantly when matching KBs with high Variety.Comment: Presented at EDBT 2001
Simplifying Entity Resolution on Web Data with Schema-agnostic, Non-iterative Matching
Entity Resolution (ER) aims to identify different descriptions in
various Knowledge Bases (KBs) that refer to the same entity. ER is
challenged by the Variety, Volume and Veracity of descriptions published
in the Web of Data. To address them, we propose the MinoanER framework
that fulfills full automation and support of highly heterogeneous
entities. MinoanER leverages a token-based similarity of entities to
define a new metric that derives the similarity of neighboring entities
from the most important relations, indicated only by statistics. For
high efficiency, similarities are computed from a set of schema-agnostic
blocks and processed in a non-iterative way that involves four
threshold-free heuristics. We demonstrate that the effectiveness of
MinoanER is comparable to existing ER tools over real KBs exhibiting low
heterogeneity in terms of entity types and content. Yet, MinoanER
outperforms state-of-the-art ER tools when matching highly heterogeneous
KBs