3,971 research outputs found
Determine the User Country of a Tweet
In the widely used message platform Twitter, about 2% of the tweets contains
the geographical location through exact GPS coordinates (latitude and
longitude). Knowing the location of a tweet is useful for many data analytics
questions. This research is looking at the determination of a location for
tweets that do not contain GPS coordinates. An accuracy of 82% was achieved
using a Naive Bayes model trained on features such as the users' timezone, the
user's language, and the parsed user location. The classifier performs well on
active Twitter countries such as the Netherlands and United Kingdom. An
analysis of errors made by the classifier shows that mistakes were made due to
limited information and shared properties between countries such as shared
timezone. A feature analysis was performed in order to see the effect of
different features. The features timezone and parsed user location were the
most informative features.Comment: CTIT Technical Report, University of Twent
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Analysis of Home Location Estimation with Iteration on Twitter Following Relationship
User's home locations are used by numerous social media applications, such as
social media analysis. However, since the user's home location is not generally
open to the public, many researchers have been attempting to develop a more
accurate home location estimation. A social network that expresses
relationships between users is used to estimate the users' home locations. The
network-based home location estimation method with iteration, which propagates
the estimated locations, is used to estimate more users' home locations. In
this study, we analyze the function of network-based home location estimation
with iteration while using the social network based on following relationships
on Twitter. The results indicate that the function that selects the most
frequent location among the friends' location has the best accuracy. Our
analysis also shows that the 88% of users, who are in the social network based
on following relationships, has at least one correct home location within
one-hop (friends and friends of friends). According to this characteristic of
the social network, we indicate that twice is sufficient for iteration.Comment: The 2016 International Conference on Advanced Informatics: Concepts,
Theory and Application (ICAICTA2016
Inferring the Origin Locations of Tweets with Quantitative Confidence
Social Internet content plays an increasingly critical role in many domains,
including public health, disaster management, and politics. However, its
utility is limited by missing geographic information; for example, fewer than
1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable,
content-based approach to estimate the location of tweets using a novel yet
simple variant of gaussian mixture models. Further, because real-world
applications depend on quantified uncertainty for such estimates, we propose
novel metrics of accuracy, precision, and calibration, and we evaluate our
approach accordingly. Experiments on 13 million global, comprehensively
multi-lingual tweets show that our approach yields reliable, well-calibrated
results competitive with previous computationally intensive methods. We also
show that a relatively small number of training data are required for good
estimates (roughly 30,000 tweets) and models are quite time-invariant
(effective on tweets many weeks newer than the training set). Finally, we show
that toponyms and languages with small geographic footprint provide the most
useful location signals.Comment: 14 pages, 6 figures. Version 2: Move mathematics to appendix, 2 new
references, various other presentation improvements. Version 3: Various
presentation improvements, accepted at ACM CSCW 201
A survey of location inference techniques on Twitter
The increasing popularity of the social networking service, Twitter, has made it more involved in day-to-day communications, strengthening social relationships and information dissemination. Conversations on Twitter are now being explored as indicators within early warning systems to alert of imminent natural disasters such as earthquakes and aid prompt emergency responses to crime. Producers are privileged to have limitless access to market perception from consumer comments on social media and microblogs. Targeted advertising can be made more effective based on user profile information such as demography, interests and location. While these applications have proven beneficial, the ability to effectively infer the location of Twitter users has even more immense value. However, accurately identifying where a message originated from or an author’s location remains a challenge, thus essentially driving research in that regard. In this paper, we survey a range of techniques applied to infer the location of Twitter users from inception to state of the art. We find significant improvements over time in the granularity levels and better accuracy with results driven by refinements to algorithms and inclusion of more spatial features
Large scale homophily analysis in twitter using a twixonomy
In this paper we perform a large-scale homophily analysis on Twitter using a hierarchical representation of users' interests which we call a Twixonomy. In order to build a population, community, or single-user Twixonomy we first associate "topical" friends in users' friendship lists (i.e. friends representing an interest rather than a social relation between peers) with Wikipedia categories. A wordsense disambiguation algorithm is used to select the appropriate wikipage for each topical friend. Starting from the set of wikipages representing "primitive" interests, we extract all paths connecting these pages with topmost Wikipedia category nodes, and we then prune the resulting graph G efficiently so as to induce a direct acyclic graph. This graph is the Twixonomy. Then, to analyze homophily, we compare different methods to detect communities in a peer friends Twitter network, and then for each community we compute the degree of homophily on the basis of a measure of pairwise semantic similarity. We show that the Twixonomy provides a means for describing users' interests in a compact and readable way and allows for a fine-grained homophily analysis. Furthermore, we show that midlow level categories in the Twixonomy represent the best balance between informativeness and compactness of the representation
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
A Data-driven Study of Influences in Twitter Communities
This paper presents a quantitative study of Twitter, one of the most popular
micro-blogging services, from the perspective of user influence. We crawl
several datasets from the most active communities on Twitter and obtain 20.5
million user profiles, along with 420.2 million directed relations and 105
million tweets among the users. User influence scores are obtained from
influence measurement services, Klout and PeerIndex. Our analysis reveals
interesting findings, including non-power-law influence distribution, strong
reciprocity among users in a community, the existence of homophily and
hierarchical relationships in social influences. Most importantly, we observe
that whether a user retweets a message is strongly influenced by the first of
his followees who posted that message. To capture such an effect, we propose
the first influencer (FI) information diffusion model and show through
extensive evaluation that compared to the widely adopted independent cascade
model, the FI model is more stable and more accurate in predicting influence
spreads in Twitter communities.Comment: 11 page
- …