Search CORE

3,035 research outputs found

Named Entity Extraction and Linking Challenge: University of Twente at #Microposts2014

Author: Habib Mena B.
Keulen Maurice van
Zhu Zhemin
Publication venue: CEUR-WS.org
Publication date: 01/01/2014
Field of study

Twitter is a potentially rich source of continuously and instantly updated information. Shortness and informality of tweets are challenges for Natural Language Processing (NLP) tasks. In this paper, we present a hybrid approach for Named Entity Extraction (NEE)and Linking (NEL) for tweets. Although NEE and NEL are two topics that are well studied in literature, almost all approaches treated the two problems separately. We believe that disambiguation (linking) could help improving the extraction process. We call this potential for mutual improvement, the reinforcement effect. It mimics the way humans understand natural language. Furthermore, our proposed approaches handles uncertainties involved in the two processes by considering possible alternatives

CiteSeerX

Maastricht University Research Portal

University of Twente Research Information

A Survey of Location Prediction on Twitter

Author: Han Jialong
Sun Aixin
Zheng Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Probabilistic Bag-Of-Hyperlinks Model for Entity Linking

Author: Bunescu R. C.
Cheng X.
Cucerzan S.
He Z.
Recht B.
Rizzo G.
Spitkovsky V. I.
Yedidia J.
Publication venue
Publication date: 29/01/2016
Field of study

Many fundamental problems in natural language processing rely on determining what entities appear in a given text. Commonly referenced as entity linking, this step is a fundamental component of many NLP tasks such as text understanding, automatic summarization, semantic search or machine translation. Name ambiguity, word polysemy, context dependencies and a heavy-tailed distribution of entities contribute to the complexity of this problem. We here propose a probabilistic approach that makes use of an effective graphical model to perform collective entity disambiguation. Input mentions (i.e.,~linkable token spans) are disambiguated jointly across an entire document by combining a document-level prior of entity co-occurrences with local information captured from mentions and their surrounding context. The model is based on simple sufficient statistics extracted from data, thus relying on few parameters to be learned. Our method does not require extensive feature engineering, nor an expensive training procedure. We use loopy belief propagation to perform approximate inference. The low complexity of our model makes this step sufficiently fast for real-time usage. We demonstrate the accuracy of our approach on a wide range of benchmark datasets, showing that it matches, and in many cases outperforms, existing state-of-the-art methods

arXiv.org e-Print Archive

Crossref

Learning Relatedness Measures for Entity Linking

Author: Ceccarelli Diego
Lucchese Claudio
Orlando Salvatore
Perego Raffaele
Trani Salvatore
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl- edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti- ties selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif- ferent state-of-the-art entity-linking algorithms

Archivio Ricerca Ca'Foscari

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

IMT Institutional Repository

Large-Scale information extraction from textual definitions through deep syntactic and semantic analysis

Author: DELLI BOVI CLAUDIO
NAVIGLI ROBERTO
Telesca Luca
Publication venue
Publication date: 01/01/2015
Field of study

We present DEFIE, an approach to large-scale Information Extraction (IE) based on a syntactic-semantic analysis of textual definitions. Given a large corpus of definitions we leverage syntactic dependencies to reduce data sparsity, then disambiguate the arguments and content words of the relation strings, and finally exploit the resulting information to organize the acquired relations hierarchically. The output of DEFIE is a high-quality knowledge base consisting of several million automatically acquired semantic relations

CiteSeerX

Archivio della ricerca- Università di Roma La Sapienza