Search CORE

186 research outputs found

Recommended from our members

Making sense of microposts: (#Microposts2014) named entity extraction & linking challenge

Author: Cano Amparo E.
Dadzie Aba-sah
Rizzo Giuseppe
Rowe Matthew
Stankovic Milan
Varga Andrea
Publication venue
Publication date: 01/01/2014
Field of study

Microposts are small fragments of social media content and a popular medium for sharing facts, opinions and emotions. They comprise a wealth of data which is increasing exponentially, and which therefore presents new challenges for the information extraction community, among others. This paper describes the ‘Making Sense of Microposts’ (#Microposts2014) Workshop’s Named Entity Extraction and Linking (NEEL) Challenge, held as part of the 2014 World Wide Web conference (WWW’14). The task of this challenge consists of the automatic extraction and linkage of entities appearing within English Microposts on Twitter. Participants were set the task of engineering a named entity extraction and DBpedia linkage system targeting a predefined taxonomy, to be run on the challenge data set, comprising a manually annotated training and a test corpus of Microposts. 43 research groups expressed intent to participate in the challenge, of which 24 signed the agreement required to be given a copy of the training and test datasets. 8 groups fulfilled all submission requirements, out of which 4 were accepted for the presentation at the workshop and a further 2 as posters. The submissions covered sequential and joint methods for approaching the named entity extraction and entity linking tasks. We describe the evaluation process and discuss the performance of the different approaches to the #Microposts2014 NEEL Challenge

Open Research Online (The Open University)

Towards Deep Semantic Analysis Of Hashtags

Author: Bansal Piyush
Bansal Romil
Varma Vasudeva
Publication venue
Publication date: 01/01/2015
Field of study

Hashtags are semantico-syntactic constructs used across various social networking and microblogging platforms to enable users to start a topic specific discussion or classify a post into a desired category. Segmenting and linking the entities present within the hashtags could therefore help in better understanding and extraction of information shared across the social media. However, due to lack of space delimiters in the hashtags (e.g #nsavssnowden), the segmentation of hashtags into constituent entities ("NSA" and "Edward Snowden" in this case) is not a trivial task. Most of the current state-of-the-art social media analytics systems like Sentiment Analysis and Entity Linking tend to either ignore hashtags, or treat them as a single word. In this paper, we present a context aware approach to segment and link entities in the hashtags to a knowledge base (KB) entry, based on the context within the tweet. Our approach segments and links the entities in hashtags such that the coherence between hashtag semantics and the tweet is maximized. To the best of our knowledge, no existing study addresses the issue of linking entities in hashtags for extracting semantic information. We evaluate our method on two different datasets, and demonstrate the effectiveness of our technique in improving the overall entity linking in tweets via additional semantic information provided by segmenting and linking entities in a hashtag.Comment: To Appear in 37th European Conference on Information Retrieva

arXiv.org e-Print Archive

Crossref

A Reverse Approach to Named Entity Extraction and Linking in Microposts

Author: Alyssa Mensch
Cem Sahin
Jason Matterer
Kara Greenfield
Kelly Geyer
Michael Coury
Olga Simek
Rajmonda Caceres
Youngjune Gwon
Publication venue
Publication date: 24/04/2020
Field of study

ABSTRACT In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions that are not co-referent with any entities in the knowledge base

CiteSeerX

MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach

Author: Bryl Volha
Brümmer Martin
Consoli Sergio
Cucerzan Silviu
Devi Pooja
Erp Marieke Van
Ferreira Thiago Castro
Hoffart Johannes
Juan
Luo Gang
Nuzzolese Andrea-Giovanni
Röder Michael
Steinmetz Nadine
van Erp Marieke
Zhang Lei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/10/2017
Field of study

Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc

arXiv.org e-Print Archive

Crossref

NEED4Tweet: a Twitterbot for tweets named entity extraction and disambiguation

Author: Habib Mena B.
Keulen Maurice van
Publication venue: The Association for Computer Linguistics
Publication date: 01/07/2015
Field of study

In this demo paper, we present NEED4Tweet, a Twitterbot for named entity extraction (NEE) and disambiguation (NED) for Tweets. The straightforward application of state-of-the-art extraction and disambiguation approaches on informal text widely used in Tweets, typically results in significantly degraded performance due to the lack of formal structure; the lack of sufficient context required; and the seldom entities involved. In this paper, we introduce a novel framework that copes with the introduced challenges. We rely on contextual and semantic features more than syntactic features which are less informative. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language

University of Twente Research Information

Information extraction for social media

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2014
Field of study

The rapid growth in IT in the last two decades has led to a growth in the amount of information available online. A new style for sharing information is social media. Social media is a continuously instantly updated source of information. In this position paper, we propose a framework for Information Extraction (IE) from unstructured user generated contents on social media. The framework proposes solutions to overcome the IE challenges in this domain such as the short context, the noisy sparse contents and the uncertain contents. To overcome the challenges facing IE from social media, State-Of-The-Art approaches need to be adapted to suit the nature of social media posts. The key components and aspects of our proposed framework are noisy text filtering, named entity extraction, named entity disambiguation, feedback loops, and uncertainty handling

CiteSeerX

University of Twente Research Information

GERBIL: General Entity Annotator Benchmarking Framework

Author: Baron Ciro
Both Andreas
Brümmer Martin
Ceccarelli Diego
Cherix Didier
CORNOLTI MARCO
Eickmann Bernd
FERRAGINA PAOLO
Lemke Christiane
Moro Andrea
Navigli Roberto
Ngonga Ngomo Axel Cyrille
PICCINNO FRANCESCO
Rizzo Giuseppe
Röder Michael
Sack Harald
Speck René
Troncy Raphaël
Usbeck Ricardo
Waitelonis Jörg
Wesemann Lars
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results

Archivio della Ricerca - Università di Pisa

Using Embeddings for Both Entity Recognition and Linking in Tweets

Author: ATTARDI GIUSEPPE
SARTIANO DANIELE
SIMI MARIA
SUCAMELI IRENE
Publication venue: country:ITA
Publication date: 01/01/2016
Field of study

L’articolo descrive la nostra partecipazione al task di Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) a Evalita 2016. Il nostro approccio si basa sull’utilizzo di un Named Entity tagger che sfrutta embeddings sia character-level che word-level. I primi consentono di apprendere le idiosincrasie della scrittura nei tweet. L’uso di un tagger completo consente di riconoscere uno spettro più ampio di entità rispetto a quelle conosciute per la loro presenza in Knowledge Base o gazetteer. Le prove sottomesse hanno ottenuto il primo, secondo e quarto dei punteggi ufficiali.The paper describes our sub-missions to the task on Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) at Evalita 2016. Our approach relies on a technique of Named Entity tagging that exploits both charac-ter-level and word-level embeddings. Character-based embeddings allow learn-ing the idiosyncrasies of the language used in tweets. Using a full-blown Named Entity tagger allows recognizing a wider range of entities than those well known by their presence in a Knowledge Base or gazetteer. Our submissions achieved first, second and fourth top offi-cial scores

Crossref

Archivio della Ricerca - Università di Pisa

OpenEdition

Combining multiple signals for semanticizing tweets: University of Amsterdam at #Microposts2015

Author: de Rijke M.
Graus D.P.
Gârbacea C.
Odijk D.
Sijaranamual I.
Publication venue: CEUR-WS
Publication date: 01/01/2015
Field of study

International Migration, Integration and Social Cohesion online publications