19 research outputs found
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets
ABSTRACT This paper describes the participation of the UNIBA team in the Named Entity rEcognition and Linking (NEEL) Challenge. We propose a knowledge-based algorithm able to recognize and link named entities in English tweets. The approach combines the simple Lesk algorithm with information coming from both a distributional semantic model and usage frequency of Wikipedia concepts. The algorithm performs poorly in the entity recognition, while it achieves good results in the disambiguation step
Recommended from our members
Making Sense of Microposts (#Microposts2015) Social Sciences Track
For the first time in its five year history the #Microposts workshop features a designated Social Science track. This paper introduces this new track by situating it within the overall workshop objectives. It highlights the importance of interdisciplinary studies in the attempt to make sense of Web user activities in general, and in the generation and consumption of Microposts in particular. This paper provides examples of related work in the field, such as Computational Social Science, reviews previous contributions to the #Microposts by the Social Science research community, and introduces the two papers presented in the track
Entity Linking for the Semantic Annotation of Italian Tweets
Linking entity mentions in Italian tweets to concepts in a knowledge base is a challenging task, due to the short and noisy nature of these short messages and the lack of specific resources for Italian. This paper proposes an adaptation of a general purpose Named Entity Linking algorithm, which exploits the similarity measure computed over a Distributional Semantic Model, in the context of Italian tweets. In order to evaluate the proposed algorithm, we introduce a new dataset of tweets for entity linking that we have developed specifically for the Italian language
Ontology Enrichment from Texts: A Biomedical Dataset for Concept Discovery and Placement
Mentions of new concepts appear regularly in texts and require automated
approaches to harvest and place them into Knowledge Bases (KB), e.g.,
ontologies and taxonomies. Existing datasets suffer from three issues, (i)
mostly assuming that a new concept is pre-discovered and cannot support
out-of-KB mention discovery; (ii) only using the concept label as the input
along with the KB and thus lacking the contexts of a concept label; and (iii)
mostly focusing on concept placement w.r.t a taxonomy of atomic concepts,
instead of complex concepts, i.e., with logical operators. To address these
issues, we propose a new benchmark, adapting MedMentions dataset (PubMed
abstracts) with SNOMED CT versions in 2014 and 2017 under the Diseases
sub-category and the broader categories of Clinical finding, Procedure, and
Pharmaceutical / biologic product. We provide usage on the evaluation with the
dataset for out-of-KB mention discovery and concept placement, adapting recent
Large Language Model based methods.Comment: 5 pages, 1 figure, accepted for CIKM 2023. The dataset, data
construction scripts, and baseline implementation are available at
https://zenodo.org/record/8228005 (Zenodo) and
https://github.com/KRR-Oxford/OET (GitHub
Using Embeddings for Both Entity Recognition and Linking in Tweets
L’articolo descrive la nostra partecipazione al task di Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) a Evalita 2016. Il nostro approccio si basa sull’utilizzo di un Named Entity tagger che sfrutta embeddings sia character-level che word-level. I primi consentono di apprendere le idiosincrasie della scrittura nei tweet. L’uso di un tagger completo consente di riconoscere uno spettro più ampio di entità rispetto a quelle conosciute per la loro presenza in Knowledge Base o gazetteer. Le prove sottomesse hanno ottenuto il primo, secondo e quarto dei punteggi ufficiali.The paper describes our sub-missions to the task on Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) at Evalita 2016. Our approach relies on a technique of Named Entity tagging that exploits both charac-ter-level and word-level embeddings. Character-based embeddings allow learn-ing the idiosyncrasies of the language used in tweets. Using a full-blown Named Entity tagger allows recognizing a wider range of entities than those well known by their presence in a Knowledge Base or gazetteer. Our submissions achieved first, second and fourth top offi-cial scores
A Reverse Approach to Named Entity Extraction and Linking in Microposts
ABSTRACT In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions that are not co-referent with any entities in the knowledge base