Search CORE

15 research outputs found

Developing a Sustainable Platform for Entity Annotation Benchmarks

Author: S Capadisli
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

A Topic-Sensitive Model for Salient Entity Linking

Author: Achim Rettinger
Cong Liu
Lei Zhang
Publication venue
Publication date: 23/04/2020
Field of study

Abstract. In recent years, the amount of entities in large knowledge bases available on the Web has been increasing rapidly. Such entities can be used to bridge textual data with knowledge bases and thus help with many tasks, such as text understanding, word sense disambiguation and information retrieval. The key issue is to link the entity mentions in documents with the corresponding entities in knowledge bases, referred to as entity linking. In addition, for many entity-centric applications, entity salience for a document has become a very important factor. This raises an impending need to identify a set of salient entities that are central to the input document. In this paper, we introduce a new task of salient entity linking and propose a graph-based disambiguation solution, which integrates several features, especially a topic-sensitive model based on Wikipedia categories. Experimental results show that our method significantly outperforms the state-of-the-art entity linking methods in terms of precision, recall and F-measure

CiteSeerX

Probabilistic Bag-Of-Hyperlinks Model for Entity Linking

Author: Bunescu R. C.
Cheng X.
Cucerzan S.
He Z.
Recht B.
Rizzo G.
Spitkovsky V. I.
Yedidia J.
Publication venue
Publication date: 29/01/2016
Field of study

Many fundamental problems in natural language processing rely on determining what entities appear in a given text. Commonly referenced as entity linking, this step is a fundamental component of many NLP tasks such as text understanding, automatic summarization, semantic search or machine translation. Name ambiguity, word polysemy, context dependencies and a heavy-tailed distribution of entities contribute to the complexity of this problem. We here propose a probabilistic approach that makes use of an effective graphical model to perform collective entity disambiguation. Input mentions (i.e.,~linkable token spans) are disambiguated jointly across an entire document by combining a document-level prior of entity co-occurrences with local information captured from mentions and their surrounding context. The model is based on simple sufficient statistics extracted from data, thus relying on few parameters to be learned. Our method does not require extensive feature engineering, nor an expensive training procedure. We use loopy belief propagation to perform approximate inference. The low complexity of our model makes this step sufficiently fast for real-time usage. We demonstrate the accuracy of our approach on a wide range of benchmark datasets, showing that it matches, and in many cases outperforms, existing state-of-the-art methods

arXiv.org e-Print Archive

Crossref

Deliverable D7.7 Dissemination and Standardisation Report v3

Author: Nixon L. (Lyndon)
The LinkedTV Consortium
Publication venue
Publication date: 08/04/2015
Field of study

This deliverable presents the LinkedTV dissemination and standardisation report for the project period of months 31 to 42 (April 2014 to March 2015)

CWI's Institutional Repository

Entity Knowledge Base Creation from Czech Wikipedia

Author: Sychra Martin
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2014
Field of study

Clem t©to prce je navrhnout a implementovat syst©m pro automatickou extrakci pojmenovanch entit z text Äesk© Wikipedie, vytvoit znalostn bze tÄchto entit a vyhodnotit spÄnost a vsledky vytvoen©ho syst©mu. Prvn Äst prce vysvÄtluje zkladn pojmy z t©to oblasti zpracovn pirozen©ho jazyka a informuje o existujcch syst©mech podobn©ho charakteru. V stedn Ästi je popsn vlastn nvrh nÄkolika metod extrakce a zpsobu implementace tÄchto metod. K extrakci byly vybrny tyto entitn typy: osoby, msta, udlosti a organizace. V zvÄru jsou popsny vsledky prce, tedy spÄnost jednolitch metod u dan©ho entitnho typu a statistiky extrakce jednotlivch entit vztaen© k celkov©mu sloen Äesk© Wikipedie.The aim of this thesis is to propose and implement a system for an automatic extraction of named entities from Czech Wikipedia, to create a knowledge base consisting of these entities and to evaluate results of the created system. The first part explains basic notions of this field and discusses related work. The main part proposes several methods of extraction and details their implementation. The following types of entities are extracted: people, places, events and organizations. The final part of the thesis presents results, i.e., the success of the individual methods for each entity type and statistics on extraction of the individual entities in the whole Czech Wikipedia context.

Digital library of Brno University of Technology

National Repository of Grey Literature

Recommended from our members

Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding

Author: Ju Yiting
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Place name disambiguation, i.e., toponym disambiguation or toponym resolution, is the task of correctly identifying a place from a set of places sharing a common name. It contributes to a variety of tasks such as knowledge extraction, query answering, geographic information retrieval, and automatic tagging. Disambiguation quality relies on the ability to correctly identify and interpret contextual clues, complicating the task for short texts. Here I propose a novel approach to the disambiguation of place names from short texts that integrates three models: entity co-occurrence, topic modeling, and word embedding. The first model uses Linked Data to identify related entities to improve disambiguation quality. The second model uses topic modeling to differentiate places based on the terms used to describe them. The third model uses word embeddings to uncover the semantic relatedness between places and contexts. I evaluate this approach using a corpus of short texts collected through web scraping, determine the suitable weights for the models, and demonstrate that the combined model, i.e., Things and Strings Model, outperforms benchmark systems such as DBpedia Spotlight, TextRazor, and Open Calais by up to 85% in F-score and 46% in Precision at 1. A web service is built to demonstrate the proposed method and it can be a building block for those applications that need place name recognition and disambiguation

eScholarship - University of California

A Survey of the First 20 Years of Research on Semantic Web and Linked Data

Author: Gandon Fabien
Publication venue: 'Lavoisier'
Publication date: 01/12/2018
Field of study

International audienceThis paper is a survey of the research topics in the field of Semantic Web, Linked Data and Web of Data. This study looks at the contributions of this research community over its first twenty years of existence. Compiling several bibliographical sources and bibliometric indicators , we identify the main research trends and we reference some of their major publications to provide an overview of that initial period. We conclude with some perspectives for the future research challenges.Cet article est une étude des sujets de recherche dans le domaine du Web sémantique, des données liées et du Web des données. Cette étude se penche sur les contributions de cette communauté de recherche au cours de ses vingt premières années d'existence. En compilant plusieurs sources bibliographiques et indicateurs bibliométriques, nous identifions les principales tendances de la recherche et nous référençons certaines de leurs publications majeures pour donner un aperçu de cette période initiale. Nous concluons avec une discussion sur les tendances et perspectives de recherche

INRIA a CCSD electronic archive server

HAL-Rennes 1

Deliverable D9.3 Final Project Report

Author: et al.
Köhler J. (Joachim)
Publication venue
Publication date: 30/03/2015
Field of study

This document comprises the final report of LinkedTV. It includes a publishable summary, a plan for use and dissemination of foreground and a report covering the wider societal implications of the project in the form of a questionnaire

CWI's Institutional Repository