451 research outputs found
MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach
Entity linking has recently been the subject of a significant body of
research. Currently, the best performing approaches rely on trained
mono-lingual models. Porting these approaches to other languages is
consequently a difficult endeavor as it requires corresponding training data
and retraining of the models. We address this drawback by presenting a novel
multilingual, knowledge-based agnostic and deterministic approach to entity
linking, dubbed MAG. MAG is based on a combination of context-based retrieval
on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data
sets and in 7 languages. Our results show that the best approach trained on
English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse
on datasets in other languages. MAG, on the other hand, achieves
state-of-the-art performance on English datasets and reaches a micro F-measure
that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc
Neural Collective Entity Linking
Entity Linking aims to link entity mentions in texts to knowledge bases, and
neural models have achieved recent success in this task. However, most existing
methods rely on local contexts to resolve entities independently, which may
usually fail due to the data sparsity of local information. To address this
issue, we propose a novel neural model for collective entity linking, named as
NCEL. NCEL applies Graph Convolutional Network to integrate both local
contextual features and global coherence information for entity linking. To
improve the computation efficiency, we approximately perform graph convolution
on a subgraph of adjacent entity mentions instead of those in the entire text.
We further introduce an attention scheme to improve the robustness of NCEL to
data noise and train the model on Wikipedia hyperlinks to avoid overfitting and
domain bias. In experiments, we evaluate NCEL on five publicly available
datasets to verify the linking performance as well as generalization ability.
We also conduct an extensive analysis of time complexity, the impact of key
modules, and qualitative results, which demonstrate the effectiveness and
efficiency of our proposed method.Comment: 12 pages, 3 figures, COLING201
Probabilistic Bag-Of-Hyperlinks Model for Entity Linking
Many fundamental problems in natural language processing rely on determining
what entities appear in a given text. Commonly referenced as entity linking,
this step is a fundamental component of many NLP tasks such as text
understanding, automatic summarization, semantic search or machine translation.
Name ambiguity, word polysemy, context dependencies and a heavy-tailed
distribution of entities contribute to the complexity of this problem.
We here propose a probabilistic approach that makes use of an effective
graphical model to perform collective entity disambiguation. Input mentions
(i.e.,~linkable token spans) are disambiguated jointly across an entire
document by combining a document-level prior of entity co-occurrences with
local information captured from mentions and their surrounding context. The
model is based on simple sufficient statistics extracted from data, thus
relying on few parameters to be learned.
Our method does not require extensive feature engineering, nor an expensive
training procedure. We use loopy belief propagation to perform approximate
inference. The low complexity of our model makes this step sufficiently fast
for real-time usage. We demonstrate the accuracy of our approach on a wide
range of benchmark datasets, showing that it matches, and in many cases
outperforms, existing state-of-the-art methods
Amélioration des performances des annotateurs sémantiques
RÉSUMÉ
Les annotateurs sémantiques jouent un rôle important dans la transition du Web actuel au Web sémantique. Ils s’occupent d’extraire des informations structurées à partir de textes bruts, permettant ainsi de pointer vers des bases des connaissances telles que DBpedia, YAGO ou Babelnet. De nombreuses compétitions sont organisées chaque année pour promouvoir les travaux de recherche de ce domaine. Nous présentons dans ce mémoire notre participation à la compétition Open Knowledge Extraction que nous avons remportée à la conférence European Semantic Web Conference 2016. Dans le cadre de cette compétition, nous avons implémenté une approche générique que nous avons testée sur quatre annotateurs sémantiques.
Nous nous concentrons dans ce mémoire à décrire un annotateur sémantique en particulier, DBpedia Spotlight. Nous exposons les différentes limites que présente cet annotateur ainsi que les approches que nous avons développées pour y remédier. Nous avons noté une augmentation d’une moyenne de 20% des performances actuelles de DBpedia Spotlight en testant sur différents corpus. Ces derniers proviennent principalement de journaux internationaux, "Reuters News Stories", "MSNBC" et le "New York Times".----------ABSTRACT
Semantic annotators play an important role in the transition from the current Web to the Semantic Web. They take care of extracting structured information from raw texts and thus make it possible to point to knowledge bases such as DBpedia, YAGO or Babelnet. Many competitions are organized every year to promote research works in this field. We present in this thesis our system which was the winner of the Open Knowledge Extraction challenge at the European Semantic Web Conference 2016. For this competition, we implemented a generic approach tested with four semantic annotators.
We particularly focus in this thesis on one semantic annotator, DBpedia Spotlight. We present its different limitations along with the approaches that we have developed to remedy them. We noted an improvement of an average of 20% of the current performance of DBpedia Spotlight on different corpora that mainly come from international newspapers, "Reuters News Stories", "MSNBC" and the "New York Times"
Amélioration des performances des annotateurs sémantiques
RÉSUMÉ
Les annotateurs sémantiques jouent un rôle important dans la transition du Web actuel au Web sémantique. Ils s’occupent d’extraire des informations structurées à partir de textes bruts, permettant ainsi de pointer vers des bases des connaissances telles que DBpedia, YAGO ou Babelnet. De nombreuses compétitions sont organisées chaque année pour promouvoir les travaux de recherche de ce domaine. Nous présentons dans ce mémoire notre participation à la compétition Open Knowledge Extraction que nous avons remportée à la conférence European Semantic Web Conference 2016. Dans le cadre de cette compétition, nous avons implémenté une approche générique que nous avons testée sur quatre annotateurs sémantiques.
Nous nous concentrons dans ce mémoire à décrire un annotateur sémantique en particulier, DBpedia Spotlight. Nous exposons les différentes limites que présente cet annotateur ainsi que les approches que nous avons développées pour y remédier. Nous avons noté une augmentation d’une moyenne de 20% des performances actuelles de DBpedia Spotlight en testant sur différents corpus. Ces derniers proviennent principalement de journaux internationaux, "Reuters News Stories", "MSNBC" et le "New York Times".----------ABSTRACT
Semantic annotators play an important role in the transition from the current Web to the Semantic Web. They take care of extracting structured information from raw texts and thus make it possible to point to knowledge bases such as DBpedia, YAGO or Babelnet. Many competitions are organized every year to promote research works in this field. We present in this thesis our system which was the winner of the Open Knowledge Extraction challenge at the European Semantic Web Conference 2016. For this competition, we implemented a generic approach tested with four semantic annotators.
We particularly focus in this thesis on one semantic annotator, DBpedia Spotlight. We present its different limitations along with the approaches that we have developed to remedy them. We noted an improvement of an average of 20% of the current performance of DBpedia Spotlight on different corpora that mainly come from international newspapers, "Reuters News Stories", "MSNBC" and the "New York Times"
- …