1,070 research outputs found
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Named Entity Recognition in Twitter using Images and Text
Named Entity Recognition (NER) is an important subtask of information
extraction that seeks to locate and recognise named entities. Despite recent
achievements, we still face limitations with correctly detecting and
classifying entities, prominently in short and noisy text, such as Twitter. An
important negative aspect in most of NER approaches is the high dependency on
hand-crafted features and domain-specific knowledge, necessary to achieve
state-of-the-art results. Thus, devising models to deal with such
linguistically complex contexts is still challenging. In this paper, we propose
a novel multi-level architecture that does not rely on any specific linguistic
resource or encoded rule. Unlike traditional approaches, we use features
extracted from images and text to classify named entities. Experimental tests
against state-of-the-art NER for Twitter on the Ritter dataset present
competitive results (0.59 F-measure), indicating that this approach may lead
towards better NER models.Comment: The 3rd International Workshop on Natural Language Processing for
Informal Text (NLPIT 2017), 8 page
Experiments to Improve Named Entity Recognition on Turkish Tweets
Social media texts are significant information sources for several
application areas including trend analysis, event monitoring, and opinion
mining. Unfortunately, existing solutions for tasks such as named entity
recognition that perform well on formal texts usually perform poorly when
applied to social media texts. In this paper, we report on experiments that
have the purpose of improving named entity recognition on Turkish tweets, using
two different annotated data sets. In these experiments, starting with a
baseline named entity recognition system, we adapt its recognition rules and
resources to better fit Twitter language by relaxing its capitalization
constraint and by diacritics-based expansion of its lexical resources, and we
employ a simplistic normalization scheme on tweets to observe the effects of
these on the overall named entity recognition performance on Turkish tweets.
The evaluation results of the system with these different settings are provided
with discussions of these results.Comment: appears in Proceedings of the EACL Workshop on Language Analysis for
Social Media, 201
MoNoise: Modeling Noise Using a Modular Normalization System
We propose MoNoise: a normalization model focused on generalizability and
efficiency, it aims at being easily reusable and adaptable. Normalization is
the task of translating texts from a non- canonical domain to a more canonical
domain, in our case: from social media data to standard language. Our proposed
model is based on a modular candidate generation in which each module is
responsible for a different type of normalization action. The most important
generation modules are a spelling correction system and a word embeddings
module. Depending on the definition of the normalization task, a static lookup
list can be crucial for performance. We train a random forest classifier to
rank the candidates, which generalizes well to all different types of
normaliza- tion actions. Most features for the ranking originate from the
generation modules; besides these features, N-gram features prove to be an
important source of information. We show that MoNoise beats the
state-of-the-art on different normalization benchmarks for English and Dutch,
which all define the task of normalization slightly different.Comment: Source code: https://bitbucket.org/robvanderg/monois
Named Entity Recognition on Turkish Tweets
Various recent studies show that the performance of named entity recognition (NER) systems developed for well-formed text types drops significantly when applied to tweets. The only existing study for the highly inflected agglutinative language Turkish reports a drop in F-Measure
from 91% to 19% when ported from news articles to tweets. In this study, we present a new named entity-annotated tweet corpus and a detailed analysis of the various tweet-specific linguistic phenomena. We perform comparative NER experiments with a rule-based multilingual NER system adapted to Turkish on three corpora: a news corpus, our new tweet corpus, and another tweet corpus. Based on the analysis and the experimentation results, we suggest system features required to improve NER results for social media like Twitter.JRC.G.2-Global security and crisis managemen
- …