186 research outputs found

    Towards Deep Semantic Analysis Of Hashtags

    Full text link
    Hashtags are semantico-syntactic constructs used across various social networking and microblogging platforms to enable users to start a topic specific discussion or classify a post into a desired category. Segmenting and linking the entities present within the hashtags could therefore help in better understanding and extraction of information shared across the social media. However, due to lack of space delimiters in the hashtags (e.g #nsavssnowden), the segmentation of hashtags into constituent entities ("NSA" and "Edward Snowden" in this case) is not a trivial task. Most of the current state-of-the-art social media analytics systems like Sentiment Analysis and Entity Linking tend to either ignore hashtags, or treat them as a single word. In this paper, we present a context aware approach to segment and link entities in the hashtags to a knowledge base (KB) entry, based on the context within the tweet. Our approach segments and links the entities in hashtags such that the coherence between hashtag semantics and the tweet is maximized. To the best of our knowledge, no existing study addresses the issue of linking entities in hashtags for extracting semantic information. We evaluate our method on two different datasets, and demonstrate the effectiveness of our technique in improving the overall entity linking in tweets via additional semantic information provided by segmenting and linking entities in a hashtag.Comment: To Appear in 37th European Conference on Information Retrieva

    A Reverse Approach to Named Entity Extraction and Linking in Microposts

    Get PDF
    ABSTRACT In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions that are not co-referent with any entities in the knowledge base

    MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach

    Full text link
    Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc

    NEED4Tweet: a Twitterbot for tweets named entity extraction and disambiguation

    Get PDF
    In this demo paper, we present NEED4Tweet, a Twitterbot for named entity extraction (NEE) and disambiguation (NED) for Tweets. The straightforward application of state-of-the-art extraction and disambiguation approaches on informal text widely used in Tweets, typically results in significantly degraded performance due to the lack of formal structure; the lack of sufficient context required; and the seldom entities involved. In this paper, we introduce a novel framework that copes with the introduced challenges. We rely on contextual and semantic features more than syntactic features which are less informative. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language

    Information extraction for social media

    Get PDF
    The rapid growth in IT in the last two decades has led to a growth in the amount of information available online. A new style for sharing information is social media. Social media is a continuously instantly updated source of information. In this position paper, we propose a framework for Information Extraction (IE) from unstructured user generated contents on social media. The framework proposes solutions to overcome the IE challenges in this domain such as the short context, the noisy sparse contents and the uncertain contents. To overcome the challenges facing IE from social media, State-Of-The-Art approaches need to be adapted to suit the nature of social media posts. The key components and aspects of our proposed framework are noisy text filtering, named entity extraction, named entity disambiguation, feedback loops, and uncertainty handling

    GERBIL: General Entity Annotator Benchmarking Framework

    Get PDF
    We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results

    Using Embeddings for Both Entity Recognition and Linking in Tweets

    Get PDF
    L’articolo descrive la nostra partecipazione al task di Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) a Evalita 2016. Il nostro approccio si basa sull’utilizzo di un Named Entity tagger che sfrutta embeddings sia character-level che word-level. I primi consentono di apprendere le idiosincrasie della scrittura nei tweet. L’uso di un tagger completo consente di riconoscere uno spettro più ampio di entità rispetto a quelle conosciute per la loro presenza in Knowledge Base o gazetteer. Le prove sottomesse hanno ottenuto il primo, secondo e quarto dei punteggi ufficiali.The paper describes our sub-missions to the task on Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) at Evalita 2016. Our approach relies on a technique of Named Entity tagging that exploits both charac-ter-level and word-level embeddings. Character-based embeddings allow learn-ing the idiosyncrasies of the language used in tweets. Using a full-blown Named Entity tagger allows recognizing a wider range of entities than those well known by their presence in a Knowledge Base or gazetteer. Our submissions achieved first, second and fourth top offi-cial scores
    • …
    corecore