22,922 research outputs found
MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach
Entity linking has recently been the subject of a significant body of
research. Currently, the best performing approaches rely on trained
mono-lingual models. Porting these approaches to other languages is
consequently a difficult endeavor as it requires corresponding training data
and retraining of the models. We address this drawback by presenting a novel
multilingual, knowledge-based agnostic and deterministic approach to entity
linking, dubbed MAG. MAG is based on a combination of context-based retrieval
on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data
sets and in 7 languages. Our results show that the best approach trained on
English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse
on datasets in other languages. MAG, on the other hand, achieves
state-of-the-art performance on English datasets and reaches a micro F-measure
that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc
Distributed Holistic Clustering on Linked Data
Link discovery is an active field of research to support data integration in
the Web of Data. Due to the huge size and number of available data sources,
efficient and effective link discovery is a very challenging task. Common
pairwise link discovery approaches do not scale to many sources with very large
entity sets. We here propose a distributed holistic approach to link many data
sources based on a clustering of entities that represent the same real-world
object. Our clustering approach provides a compact and fused representation of
entities, and can identify errors in existing links as well as many new links.
We support a distributed execution of the clustering approach to achieve faster
execution times and scalability for large real-world data sets. We provide a
novel gold standard for multi-source clustering, and evaluate our methods with
respect to effectiveness and efficiency for large data sets from the geographic
and music domains
Streamlining the CERIF XML data exchange format: towards CERIF 2.0.
The Common European Research Information Format (CERIF) is an established standard for Current Research Information Systems (CRISs) facing the increasing need for information sharing and exchange. euroCRIS released the first official CERIF XML exchange format in 2007; it followed the structure of the relational data model. Based on experience with the format and consulting with the CRIS community on newer interoperation and exchange concepts, the authors proposed an update to the CERIF XML exchange format. This updated CERIF XML aimed at compactness with expression and backwards compatibility. With this article, we provide insight into the motivation for change, present the updated format, and finally outline possible next steps
- …