6,754 research outputs found
A meta-analysis of state-of-the-art electoral prediction from Twitter data
Electoral prediction from Twitter data is an appealing research topic. It
seems relatively straightforward and the prevailing view is overly optimistic.
This is problematic because while simple approaches are assumed to be good
enough, core problems are not addressed. Thus, this paper aims to (1) provide a
balanced and critical review of the state of the art; (2) cast light on the
presume predictive power of Twitter data; and (3) depict a roadmap to push
forward the field. Hence, a scheme to characterize Twitter prediction methods
is proposed. It covers every aspect from data collection to performance
evaluation, through data processing and vote inference. Using that scheme,
prior research is analyzed and organized to explain the main approaches taken
up to date but also their weaknesses. This is the first meta-analysis of the
whole body of research regarding electoral prediction from Twitter data. It
reveals that its presumed predictive power regarding electoral prediction has
been rather exaggerated: although social media may provide a glimpse on
electoral outcomes current research does not provide strong evidence to support
it can replace traditional polls. Finally, future lines of research along with
a set of requirements they must fulfill are provided.Comment: 19 pages, 3 table
Towards Real-Time, Country-Level Location Classification of Worldwide Tweets
In contrast to much previous work that has focused on location classification
of tweets restricted to a specific country, here we undertake the task in a
broader context by classifying global tweets at the country level, which is so
far unexplored in a real-time scenario. We analyse the extent to which a
tweet's country of origin can be determined by making use of eight
tweet-inherent features for classification. Furthermore, we use two datasets,
collected a year apart from each other, to analyse the extent to which a model
trained from historical tweets can still be leveraged for classification of new
tweets. With classification experiments on all 217 countries in our datasets,
as well as on the top 25 countries, we offer some insights into the best use of
tweet-inherent features for an accurate country-level classification of tweets.
We find that the use of a single feature, such as the use of tweet content
alone -- the most widely used feature in previous work -- leaves much to be
desired. Choosing an appropriate combination of both tweet content and metadata
can actually lead to substantial improvements of between 20\% and 50\%. We
observe that tweet content, the user's self-reported location and the user's
real name, all of which are inherent in a tweet and available in a real-time
scenario, are particularly useful to determine the country of origin. We also
experiment on the applicability of a model trained on historical tweets to
classify new tweets, finding that the choice of a particular combination of
features whose utility does not fade over time can actually lead to comparable
performance, avoiding the need to retrain. However, the difficulty of achieving
accurate classification increases slightly for countries with multiple
commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data
Engineering (IEEE TKDE
The ISIS Twitter census: defining and describing the population of ISIS supporters on Twitter
Presents a demographic snapshot of ISIS supporters on Twitter by analysing a sample of 20,000 ISIS-supporting Twitter accounts, mapping the locations, preferred languages, and the number and type of followers of these accounts.
Overview
Although much ink has been spilled on ISIS’s activity on Twitter, very basic questions about the group’s social media strategy remain unanswered. In a new analysis paper, J.M. Berger and Jonathon Morgan answer fundamental questions about how many Twitter users support ISIS, who and where they are, and how they participate in its highly organized online activities.
Previous analyses of ISIS’s Twitter reach have relied on limited segments of the overall ISIS social network. The small, cellular nature of that network—and the focus on particular subsets within the network such as foreign fighters—may create misleading conclusions. This information vacuum extends to discussions of how the West should respond to the group’s online campaigns.
Berger and Morgan present a demographic snapshot of ISIS supporters on Twitter by analyzing a sample of 20,000 ISIS-supporting Twitter accounts. Using a sophisticated and innovative methodology, the authors map the locations, preferred languages, and the number and type of followers of these accounts.
Among the key findings:
From September through December 2014, the authors estimate that at least 46,000 Twitter accounts were used by ISIS supporters, although not all of them were active at the same time.
Typical ISIS supporters were located within the organization’s territories in Syria and Iraq, as well as in regions contested by ISIS. Hundreds of ISIS-supporting accounts sent tweets with location metadata embedded.
Almost one in five ISIS supporters selected English as their primary language when using Twitter. Three quarters selected Arabic.
ISIS-supporting accounts had an average of about 1,000 followers each, considerably higher than an ordinary Twitter user. ISIS-supporting accounts were also considerably more active than non-supporting users.
A minimum of 1,000 ISIS-supporting accounts were suspended by Twitter between September and December 2014. Accounts that tweeted most often and had the most followers were most likely to be suspended.
Much of ISIS’s social media success can be attributed to a relatively small group of hyperactive users, numbering between 500 and 2,000 accounts, which tweet in concentrated bursts of high volume.
Based on their key findings, the authors recommend social media companies and the U.S government work together to devise appropriate responses to extremism on social media. Approaches to the problem of extremist use of social media, Berger and Morgan contend, are most likely to succeed when they are mainstreamed into wider dialogues among the broad range of community, private, and public stakeholders
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election
Social media has become an emerging alternative to opinion polls for public
opinion collection, while it is still posing many challenges as a passive data
source, such as structurelessness, quantifiability, and representativeness.
Social media data with geotags provide new opportunities to unveil the
geographic locations of users expressing their opinions. This paper aims to
answer two questions: 1) whether quantifiable measurement of public opinion can
be obtained from social media and 2) whether it can produce better or
complementary measures compared to opinion polls. This research proposes a
novel approach to measure the relative opinion of Twitter users towards public
issues in order to accommodate more complex opinion structures and take
advantage of the geography pertaining to the public issues. To ensure that this
new measure is technically feasible, a modeling framework is developed
including building a training dataset by adopting a state-of-the-art approach
and devising a new deep learning method called Opinion-Oriented Word Embedding.
With a case study of the tweets selected for the 2016 U.S. presidential
election, we demonstrate the predictive superiority of our relative opinion
approach and we show how it can aid visual analytics and support opinion
predictions. Although the relative opinion measure is proved to be more robust
compared to polling, our study also suggests that the former can advantageously
complement the later in opinion prediction
- …