37 research outputs found
Analysis of Home Location Estimation with Iteration on Twitter Following Relationship
User's home locations are used by numerous social media applications, such as
social media analysis. However, since the user's home location is not generally
open to the public, many researchers have been attempting to develop a more
accurate home location estimation. A social network that expresses
relationships between users is used to estimate the users' home locations. The
network-based home location estimation method with iteration, which propagates
the estimated locations, is used to estimate more users' home locations. In
this study, we analyze the function of network-based home location estimation
with iteration while using the social network based on following relationships
on Twitter. The results indicate that the function that selects the most
frequent location among the friends' location has the best accuracy. Our
analysis also shows that the 88% of users, who are in the social network based
on following relationships, has at least one correct home location within
one-hop (friends and friends of friends). According to this characteristic of
the social network, we indicate that twice is sufficient for iteration.Comment: The 2016 International Conference on Advanced Informatics: Concepts,
Theory and Application (ICAICTA2016
Determine the User Country of a Tweet
In the widely used message platform Twitter, about 2% of the tweets contains
the geographical location through exact GPS coordinates (latitude and
longitude). Knowing the location of a tweet is useful for many data analytics
questions. This research is looking at the determination of a location for
tweets that do not contain GPS coordinates. An accuracy of 82% was achieved
using a Naive Bayes model trained on features such as the users' timezone, the
user's language, and the parsed user location. The classifier performs well on
active Twitter countries such as the Netherlands and United Kingdom. An
analysis of errors made by the classifier shows that mistakes were made due to
limited information and shared properties between countries such as shared
timezone. A feature analysis was performed in order to see the effect of
different features. The features timezone and parsed user location were the
most informative features.Comment: CTIT Technical Report, University of Twent
Understanding Citizen Reactions and Ebola-Related Information Propagation on Social Media
In severe outbreaks such as Ebola, bird flu and SARS, people share news, and
their thoughts and responses regarding the outbreaks on social media.
Understanding how people perceive the severe outbreaks, what their responses
are, and what factors affect these responses become important. In this paper,
we conduct a comprehensive study of understanding and mining the spread of
Ebola-related information on social media. In particular, we (i) conduct a
large-scale data-driven analysis of geotagged social media messages to
understand citizen reactions regarding Ebola; (ii) build information
propagation models which measure locality of information; and (iii) analyze
spatial, temporal and social properties of Ebola-related information. Our work
provides new insights into Ebola outbreak by understanding citizen reactions
and topic-based information propagation, as well as providing a foundation for
analysis and response of future public health crises.Comment: 2016 IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining (ASONAM 2016
Data Quality Challenges in Twitter Content Analysis for Informing Policy Making in Health Care
Social media platforms and microblogs have become popular fora where the general public expresses opinions and concerns on a variety of matters. As a result, private and public organizations have been looking into ways for finding, understanding and communicating insights extracted from this massive amount of text-based interconnected data. There are, however, important difficulties associated with the noisiness and reliability of the content that hinder the analysis of the data. This paper reports the main challenges found in a real-world experience with social media used as a source of data to support policy making and assessment. We also propose a set of strategies for the precise retrieval of data, the profiling of social media users, and the involvement of policy makers in the analytical process
Inferring the Origin Locations of Tweets with Quantitative Confidence
Social Internet content plays an increasingly critical role in many domains,
including public health, disaster management, and politics. However, its
utility is limited by missing geographic information; for example, fewer than
1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable,
content-based approach to estimate the location of tweets using a novel yet
simple variant of gaussian mixture models. Further, because real-world
applications depend on quantified uncertainty for such estimates, we propose
novel metrics of accuracy, precision, and calibration, and we evaluate our
approach accordingly. Experiments on 13 million global, comprehensively
multi-lingual tweets show that our approach yields reliable, well-calibrated
results competitive with previous computationally intensive methods. We also
show that a relatively small number of training data are required for good
estimates (roughly 30,000 tweets) and models are quite time-invariant
(effective on tweets many weeks newer than the training set). Finally, we show
that toponyms and languages with small geographic footprint provide the most
useful location signals.Comment: 14 pages, 6 figures. Version 2: Move mathematics to appendix, 2 new
references, various other presentation improvements. Version 3: Various
presentation improvements, accepted at ACM CSCW 201
An Analysis on the Spatial Characteristics of Satisfaction on the Residential Environment Using Tweets
The purpose of this study is to analyze the regional difference of spatial distribution of residential satisfaction by extracting the elements of residential satisfaction in the text of tweet data. We determined three themes such as “safety”, “amenity” and “convenience”, base search terms by theme. And we detailed the search terms by base search term in order to retrieve the tweets related to the satisfaction of residential environments. We analyzed the selected tweets and visualized the results of analysis on the map and then investigated the satisfaction of residential environments through the index analysis which was a proportion of tweet ratio of theme to whole tweet ratio by region This study shows that it may replace the offline survey method by the analysis of tweets on SNS in investigating the satisfaction of residential environments by regions in South Korea