6 research outputs found

    Entity Linking for the Semantic Annotation of Italian Tweets

    Get PDF
    Linking entity mentions in Italian tweets to concepts in a knowledge base is a challenging task, due to the short and noisy nature of these short messages and the lack of specific resources for Italian. This paper proposes an adaptation of a general purpose Named Entity Linking algorithm, which exploits the similarity measure computed over a Distributional Semantic Model, in the context of Italian tweets. In order to evaluate the proposed algorithm, we introduce a new dataset of tweets for entity linking that we have developed specifically for the Italian language

    UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets

    Get PDF
    ABSTRACT This paper describes the participation of the UNIBA team in the Named Entity rEcognition and Linking (NEEL) Challenge. We propose a knowledge-based algorithm able to recognize and link named entities in English tweets. The approach combines the simple Lesk algorithm with information coming from both a distributional semantic model and usage frequency of Wikipedia concepts. The algorithm performs poorly in the entity recognition, while it achieves good results in the disambiguation step

    Location Reference Recognition from Texts: A Survey and Comparison

    Full text link
    A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of its specific applications is still missing. Further, there is a lack of a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching–based, statistical learning-–based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references worldwide. Results from this thorough evaluation can help inform future methodological developments and can help guide the selection of proper approaches based on application needs

    LOCATION MENTION PREDICTION FROM DISASTER TWEETS

    Get PDF
    While utilizing Twitter data for crisis management is of interest to different response authorities, a critical challenge that hinders the utilization of such data is the scarcity of automated tools that extract and resolve geolocation information. This dissertation focuses on the Location Mention Prediction (LMP) problem that consists of Location Mention Recognition (LMR) and Location Mention Disambiguation (LMD) tasks. Our work contributes to studying two main factors that influence the robustness of LMP systems: (i) the dataset used to train the model, and (ii) the learning model. As for the training dataset, we study the best training and evaluation strategies to exploit existing datasets and tools at the onset of disaster events. We emphasize that the size of training data matters and recommend considering the data domain, the disaster domain, and geographical proximity when training LMR models. We further construct the public IDRISI datasets, the largest to date English and first Arabic datasets for the LMP tasks. Rigorous analysis and experiments show that the IDRISI datasets are diverse, and domain and geographically generalizable, compared to existing datasets. As for the learning models, the LMP tasks are understudied in the disaster management domain. To address this, we reformulate the LMR and LMD modeling and evaluation to better suit the requirements of the response authorities. Moreover, we introduce competitive and state-of-the-art LMR and LMD models that are compared against a representative set of baselines for both Arabic and English languages
    corecore