Search CORE

30,562 research outputs found

MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities

Author: Christophides Vassilis
Efthymiou Vasilis
Papadakis George
Stefanidis Kostas
Publication venue
Publication date: 15/05/2019
Field of study

Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly heterogeneous entities, and massive parallelization of the ER process. MinoanER leverages a token-based similarity of entities to define a new metric that derives the similarity of neighboring entities from the most important relations, as they are indicated only by statistics. A composite blocking method is employed to capture different sources of matching evidence from the content, neighbors, or names of entities. The search space of candidate pairs for comparison is compactly abstracted by a novel disjunctive blocking graph and processed by a non-iterative, massively parallel matching algorithm that consists of four generic, schema-agnostic matching rules that are quite robust with respect to their internal configuration. We demonstrate that the effectiveness of MinoanER is comparable to existing ER tools over real KBs exhibiting low Variety, but it outperforms them significantly when matching KBs with high Variety.Comment: Presented at EDBT 2001

arXiv.org e-Print Archive

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Hierarchical information clustering by means of topologically embedded graphs

Author: Aste Tomaso
Matteo T. Di
Song Won-Min
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 10/12/2015
Field of study

We introduce a graph-theoretic approach to extract clusters and hierarchies in complex data-sets in an unsupervised and deterministic manner, without the use of any prior information. This is achieved by building topologically embedded networks containing the subset of most significant links and analyzing the network structure. For a planar embedding, this method provides both the intra-cluster hierarchy, which describes the way clusters are composed, and the inter-cluster hierarchy which describes how clusters gather together. We discuss performance, robustness and reliability of this method by first investigating several artificial data-sets, finding that it can outperform significantly other established approaches. Then we show that our method can successfully differentiate meaningful clusters and hierarchies in a variety of real data-sets. In particular, we find that the application to gene expression patterns of lymphoma samples uncovers biologically significant groups of genes which play key-roles in diagnosis, prognosis and treatment of some of the most relevant human lymphoid malignancies

The Australian National University

Hierarchical information clustering by means of topologically embedded graphs

Author: A Alizadeh
A Jain
AI Saez
AJ Nathalie
BB Ding
C Rivera
D Arthur
D Garlaschelli
DL Davies
DM Rocke
G Caldarelli
G Lenz
G Ringel
G Romeo
GL Pellegrini
GP Coffey
H Hooyberghs
IS Lossos
IT Hernádvölgyi
J Dunn
J Handl
J McQueen
J Quackenbush
J Ruan
J Shi
J Wang
JM Boyer
JS Abramson
JSJ Andrade
KII Goh
L Amaral
L Chen
L Hubert
L Leseux
LL Lam
M Arsura
M Eisen
M Filipits
M Girvan
M Kitsak
M Tumminello
MC de Souto
N Wada
PF Jonsson
R Diestel
R Seki
R Xu
RA Fisher
S Fortunato
ShaunS Wang
SV Buldyrev
T Aste
T Di Matteo
T Di Matteo
T Di Matteo
T Kamijo
T Kohonen
T Sorensen
T. Di Matteo
Tomaso Aste
U von Luxburg
WM Song
Won-Min Song
X Zhao
XF Zhao
Ying Xu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 20/10/2011
Field of study

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

Kent Academic Repository

King's Research Portal

FigShare