4,224 research outputs found
MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities
Entity Resolution (ER) aims to identify different descriptions in various
Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the
Variety, Volume and Veracity of entity descriptions published in the Web of
Data. To address them, we propose the MinoanER framework that simultaneously
fulfills full automation, support of highly heterogeneous entities, and massive
parallelization of the ER process. MinoanER leverages a token-based similarity
of entities to define a new metric that derives the similarity of neighboring
entities from the most important relations, as they are indicated only by
statistics. A composite blocking method is employed to capture different
sources of matching evidence from the content, neighbors, or names of entities.
The search space of candidate pairs for comparison is compactly abstracted by a
novel disjunctive blocking graph and processed by a non-iterative, massively
parallel matching algorithm that consists of four generic, schema-agnostic
matching rules that are quite robust with respect to their internal
configuration. We demonstrate that the effectiveness of MinoanER is comparable
to existing ER tools over real KBs exhibiting low Variety, but it outperforms
them significantly when matching KBs with high Variety.Comment: Presented at EDBT 2001
QUASII: QUery-Aware Spatial Incremental Index.
With large-scale simulations of increasingly detailed models and improvement of data acquisition technologies, massive amounts of data are easily and quickly created and collected. Traditional systems require indexes to be built before analytic queries can be executed efficiently. Such an indexing step requires substantial computing resources and introduces a considerable and growing data-to-insight gap where scientists need to wait before they can perform any analysis. Moreover, scientists often only use a small fraction of the data - the parts containing interesting phenomena - and indexing it fully does not always pay off. In this paper we develop a novel incremental index for the exploration of spatial data. Our approach, QUASII, builds a data-oriented index as a side-effect of query execution. QUASII distributes the cost of indexing across all queries, while building the index structure only for the subset of data queried. It reduces data-to-insight time and curbs the cost of incremental indexing by gradually and partially sorting the data, while producing a data-oriented hierarchical structure at the same time. As our experiments show, QUASII reduces the data-to-insight time by up to a factor of 11.4x, while its performance converges to that of the state-of-the-art static indexes
- …