451 research outputs found

    MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach

    Full text link
    Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc

    Neural Collective Entity Linking

    Full text link
    Entity Linking aims to link entity mentions in texts to knowledge bases, and neural models have achieved recent success in this task. However, most existing methods rely on local contexts to resolve entities independently, which may usually fail due to the data sparsity of local information. To address this issue, we propose a novel neural model for collective entity linking, named as NCEL. NCEL applies Graph Convolutional Network to integrate both local contextual features and global coherence information for entity linking. To improve the computation efficiency, we approximately perform graph convolution on a subgraph of adjacent entity mentions instead of those in the entire text. We further introduce an attention scheme to improve the robustness of NCEL to data noise and train the model on Wikipedia hyperlinks to avoid overfitting and domain bias. In experiments, we evaluate NCEL on five publicly available datasets to verify the linking performance as well as generalization ability. We also conduct an extensive analysis of time complexity, the impact of key modules, and qualitative results, which demonstrate the effectiveness and efficiency of our proposed method.Comment: 12 pages, 3 figures, COLING201

    Probabilistic Bag-Of-Hyperlinks Model for Entity Linking

    Full text link
    Many fundamental problems in natural language processing rely on determining what entities appear in a given text. Commonly referenced as entity linking, this step is a fundamental component of many NLP tasks such as text understanding, automatic summarization, semantic search or machine translation. Name ambiguity, word polysemy, context dependencies and a heavy-tailed distribution of entities contribute to the complexity of this problem. We here propose a probabilistic approach that makes use of an effective graphical model to perform collective entity disambiguation. Input mentions (i.e.,~linkable token spans) are disambiguated jointly across an entire document by combining a document-level prior of entity co-occurrences with local information captured from mentions and their surrounding context. The model is based on simple sufficient statistics extracted from data, thus relying on few parameters to be learned. Our method does not require extensive feature engineering, nor an expensive training procedure. We use loopy belief propagation to perform approximate inference. The low complexity of our model makes this step sufficiently fast for real-time usage. We demonstrate the accuracy of our approach on a wide range of benchmark datasets, showing that it matches, and in many cases outperforms, existing state-of-the-art methods

    Amélioration des performances des annotateurs sémantiques

    Get PDF
    RÉSUMÉ Les annotateurs sémantiques jouent un rôle important dans la transition du Web actuel au Web sémantique. Ils s’occupent d’extraire des informations structurées à partir de textes bruts, permettant ainsi de pointer vers des bases des connaissances telles que DBpedia, YAGO ou Babelnet. De nombreuses compétitions sont organisées chaque année pour promouvoir les travaux de recherche de ce domaine. Nous présentons dans ce mémoire notre participation à la compétition Open Knowledge Extraction que nous avons remportée à la conférence European Semantic Web Conference 2016. Dans le cadre de cette compétition, nous avons implémenté une approche générique que nous avons testée sur quatre annotateurs sémantiques. Nous nous concentrons dans ce mémoire à décrire un annotateur sémantique en particulier, DBpedia Spotlight. Nous exposons les différentes limites que présente cet annotateur ainsi que les approches que nous avons développées pour y remédier. Nous avons noté une augmentation d’une moyenne de 20% des performances actuelles de DBpedia Spotlight en testant sur différents corpus. Ces derniers proviennent principalement de journaux internationaux, "Reuters News Stories", "MSNBC" et le "New York Times".----------ABSTRACT Semantic annotators play an important role in the transition from the current Web to the Semantic Web. They take care of extracting structured information from raw texts and thus make it possible to point to knowledge bases such as DBpedia, YAGO or Babelnet. Many competitions are organized every year to promote research works in this field. We present in this thesis our system which was the winner of the Open Knowledge Extraction challenge at the European Semantic Web Conference 2016. For this competition, we implemented a generic approach tested with four semantic annotators. We particularly focus in this thesis on one semantic annotator, DBpedia Spotlight. We present its different limitations along with the approaches that we have developed to remedy them. We noted an improvement of an average of 20% of the current performance of DBpedia Spotlight on different corpora that mainly come from international newspapers, "Reuters News Stories", "MSNBC" and the "New York Times"

    Amélioration des performances des annotateurs sémantiques

    Get PDF
    RÉSUMÉ Les annotateurs sémantiques jouent un rôle important dans la transition du Web actuel au Web sémantique. Ils s’occupent d’extraire des informations structurées à partir de textes bruts, permettant ainsi de pointer vers des bases des connaissances telles que DBpedia, YAGO ou Babelnet. De nombreuses compétitions sont organisées chaque année pour promouvoir les travaux de recherche de ce domaine. Nous présentons dans ce mémoire notre participation à la compétition Open Knowledge Extraction que nous avons remportée à la conférence European Semantic Web Conference 2016. Dans le cadre de cette compétition, nous avons implémenté une approche générique que nous avons testée sur quatre annotateurs sémantiques. Nous nous concentrons dans ce mémoire à décrire un annotateur sémantique en particulier, DBpedia Spotlight. Nous exposons les différentes limites que présente cet annotateur ainsi que les approches que nous avons développées pour y remédier. Nous avons noté une augmentation d’une moyenne de 20% des performances actuelles de DBpedia Spotlight en testant sur différents corpus. Ces derniers proviennent principalement de journaux internationaux, "Reuters News Stories", "MSNBC" et le "New York Times".----------ABSTRACT Semantic annotators play an important role in the transition from the current Web to the Semantic Web. They take care of extracting structured information from raw texts and thus make it possible to point to knowledge bases such as DBpedia, YAGO or Babelnet. Many competitions are organized every year to promote research works in this field. We present in this thesis our system which was the winner of the Open Knowledge Extraction challenge at the European Semantic Web Conference 2016. For this competition, we implemented a generic approach tested with four semantic annotators. We particularly focus in this thesis on one semantic annotator, DBpedia Spotlight. We present its different limitations along with the approaches that we have developed to remedy them. We noted an improvement of an average of 20% of the current performance of DBpedia Spotlight on different corpora that mainly come from international newspapers, "Reuters News Stories", "MSNBC" and the "New York Times"
    • …
    corecore