1,549 research outputs found

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

    Neural approaches to sequence labeling for information extraction

    Get PDF
    Een belangrijk aspect binnen artificiële intelligentie (AI) is het interpreteren van menselijke taal uitgedrukt in tekstuele (geschreven) vorm: natural Language processing (NLP) is belangrijk gezien tekstuele informatie nuttig is voor veel toepassingen. Toch is het verstaan ervan (zogenaamde natural Language understanding, (NLU) een uitdaging, gezien de ongestructureerde vorm van tekst, waarvan de betekenis vaak dubbelzinnig en contextafhankelijk is. In dit proefschrift introduceren we oplossingen voor tekortkomingen van gerelateerd werk bij het behandelen van fundamentele taken in natuurlijke taalverwerking, zoals named entity recognition (i.e. het identificeren van de entiteiten die in een zin voorkomen) en relatie-extractie (het identificeren van relaties tussen entiteiten). Vertrekkend van een specifiek probleem (met name het identificeren van de structuur van een huis aan de hand van een tekstueel zoekertje), bouwen we stapsgewijs een complete (geautomatiseerde) oplossing voor de bovengenoemde taken, op basis van neutrale netwerkarchitecturen. Onze oplossingen zijn algemeen toepasbaar op verschillende toepassingsdomeinen en talen. We beschouwen daarnaast ook de taak van het identificeren van relevante gebeurtenissen tijdens een evenement (bv. een doelpunt tijdens een voetbalwedstrijd), in informatiestromen op Twitter. Meer bepaald formuleren we dit probleem als het labelen van woord sequenties (vergelijkbaar met named entity recognition), waarbij we de chronologische relatie tussen opeenvolgende tweets benutten

    NEED4Tweet: a Twitterbot for tweets named entity extraction and disambiguation

    Get PDF
    In this demo paper, we present NEED4Tweet, a Twitterbot for named entity extraction (NEE) and disambiguation (NED) for Tweets. The straightforward application of state-of-the-art extraction and disambiguation approaches on informal text widely used in Tweets, typically results in significantly degraded performance due to the lack of formal structure; the lack of sufficient context required; and the seldom entities involved. In this paper, we introduce a novel framework that copes with the introduced challenges. We rely on contextual and semantic features more than syntactic features which are less informative. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language

    Named Entity Recognition on Turkish Tweets

    Get PDF
    Various recent studies show that the performance of named entity recognition (NER) systems developed for well-formed text types drops significantly when applied to tweets. The only existing study for the highly inflected agglutinative language Turkish reports a drop in F-Measure from 91% to 19% when ported from news articles to tweets. In this study, we present a new named entity-annotated tweet corpus and a detailed analysis of the various tweet-specific linguistic phenomena. We perform comparative NER experiments with a rule-based multilingual NER system adapted to Turkish on three corpora: a news corpus, our new tweet corpus, and another tweet corpus. Based on the analysis and the experimentation results, we suggest system features required to improve NER results for social media like Twitter.JRC.G.2-Global security and crisis managemen

    Experiments to Improve Named Entity Recognition on Turkish Tweets

    Full text link
    Social media texts are significant information sources for several application areas including trend analysis, event monitoring, and opinion mining. Unfortunately, existing solutions for tasks such as named entity recognition that perform well on formal texts usually perform poorly when applied to social media texts. In this paper, we report on experiments that have the purpose of improving named entity recognition on Turkish tweets, using two different annotated data sets. In these experiments, starting with a baseline named entity recognition system, we adapt its recognition rules and resources to better fit Twitter language by relaxing its capitalization constraint and by diacritics-based expansion of its lexical resources, and we employ a simplistic normalization scheme on tweets to observe the effects of these on the overall named entity recognition performance on Turkish tweets. The evaluation results of the system with these different settings are provided with discussions of these results.Comment: appears in Proceedings of the EACL Workshop on Language Analysis for Social Media, 201

    Uncertainty handling in named entity extraction and disambiguation for informal text

    Get PDF
    Social media content represents a large portion of all textual content appearing on the Internet. These streams of user generated content (UGC) provide an opportunity and challenge for media analysts to analyze huge amount of new data and use them to infer and reason with new information. A main challenge of natural language is its ambiguity and vagueness. To automatically resolve ambiguity, the grammatical structure of sentences is used. However, when we move to informal language widely used in social media, the language becomes more ambiguous and thus more challenging for automatic understanding.\ud Information Extraction (IE) is the research field that enables the use of unstructured text in a structured way. Named Entity Extraction (NEE) is a sub task of IE that aims to locate phrases (mentions) in the text that represent names of entities such as persons, organizations or locations regardless of their type. Named Entity Disambiguation (NED) is the task of determining which correct person, place, event, etc. is referred to by a mention.\ud The goal of this paper is to provide an overview on some approaches that mimic the human way of recognition and disambiguation of named entities especially for domains that lack formal sentence structure. The proposed methods open the doors for more sophisticated applications based on users’ contributions on social media. We propose a robust combined framework for NEE and NED in semi-formal and informal text. The achieved robustness has been proven to be valid across languages and domains and to be independent of the selected extraction and disambiguation techniques. It is also shown to be robust against the informality of the used language. We have discovered a reinforcement effect and exploited it a technique that improves extraction quality by feeding back disambiguation results. We present a method of handling the uncertainty involved in extraction to improve the disambiguation results
    corecore