464 research outputs found
Can electoral popularity be predicted using socially generated big data?
Today, our more-than-ever digital lives leave significant footprints in
cyberspace. Large scale collections of these socially generated footprints,
often known as big data, could help us to re-investigate different aspects of
our social collective behaviour in a quantitative framework. In this
contribution we discuss one such possibility: the monitoring and predicting of
popularity dynamics of candidates and parties through the analysis of socially
generated data on the web during electoral campaigns. Such data offer
considerable possibility for improving our awareness of popularity dynamics.
However they also suffer from significant drawbacks in terms of
representativeness and generalisability. In this paper we discuss potential
ways around such problems, suggesting the nature of different political systems
and contexts might lend differing levels of predictive power to certain types
of data source. We offer an initial exploratory test of these ideas, focussing
on two data streams, Wikipedia page views and Google search queries. On the
basis of this data, we present popularity dynamics from real case examples of
recent elections in three different countries.Comment: To appear in Information Technolog
The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics
Activity of modern scholarship creates online footprints galore. Along with
traditional metrics of research quality, such as citation counts, online images
of researchers and institutions increasingly matter in evaluating academic
impact, decisions about grant allocation, and promotion. We examined 400
biographical Wikipedia articles on academics from four scientific fields to
test if being featured in the world's largest online encyclopedia is correlated
with higher academic notability (assessed through citation counts). We found no
statistically significant correlation between Wikipedia articles metrics
(length, number of edits, number of incoming links from other articles, etc.)
and academic notability of the mentioned researchers. We also did not find any
evidence that the scientists with better WP representation are necessarily more
prominent in their fields. In addition, we inspected the Wikipedia coverage of
notable scientists sampled from Thomson Reuters list of "highly cited
researchers". In each of the examined fields, Wikipedia failed in covering
notable scholars properly. Both findings imply that Wikipedia might be
producing an inaccurate image of academics on the front end of science. By
shedding light on how public perception of academic progress is formed, this
study alerts that a subjective element might have been introduced into the
hitherto structured system of academic evaluation.Comment: To appear in EPJ Data Science. To have the Additional Files and
Datasets e-mail the corresponding autho
Modeling the Rise in Internet-based Petitions
Contemporary collective action, much of which involves social media and other
Internet-based platforms, leaves a digital imprint which may be harvested to
better understand the dynamics of mobilization. Petition signing is an example
of collective action which has gained in popularity with rising use of social
media and provides such data for the whole population of petition signatories
for a given platform. This paper tracks the growth curves of all 20,000
petitions to the UK government over 18 months, analyzing the rate of growth and
outreach mechanism. Previous research has suggested the importance of the first
day to the ultimate success of a petition, but has not examined early growth
within that day, made possible here through hourly resolution in the data. The
analysis shows that the vast majority of petitions do not achieve any measure
of success; over 99 percent fail to get the 10,000 signatures required for an
official response and only 0.1 percent attain the 100,000 required for a
parliamentary debate. We analyze the data through a multiplicative process
model framework to explain the heterogeneous growth of signatures at the
population level. We define and measure an average outreach factor for
petitions and show that it decays very fast (reducing to 0.1% after 10 hours).
After 24 hours, a petition's fate is virtually set. The findings seem to
challenge conventional analyses of collective action from economics and
political science, where the production function has been assumed to follow an
S-shaped curve.Comment: Submitted to EPJ Data Scienc
Topic Modelling of Everyday Sexism Project Entries
The Everyday Sexism Project documents everyday examples of sexism reported by
volunteer contributors from all around the world. It collected 100,000 entries
in 13+ languages within the first 3 years of its existence. The content of
reports in various languages submitted to Everyday Sexism is a valuable source
of crowdsourced information with great potential for feminist and gender
studies. In this paper, we take a computational approach to analyze the content
of reports. We use topic-modelling techniques to extract emerging topics and
concepts from the reports, and to map the semantic relations between those
topics. The resulting picture closely resembles and adds to that arrived at
through qualitative analysis, showing that this form of topic modeling could be
useful for sifting through datasets that had not previously been subject to any
analysis. More precisely, we come up with a map of topics for two different
resolutions of our topic model and discuss the connection between the
identified topics. In the low resolution picture, for instance, we found Public
space/Street, Online, Work related/Office, Transport, School, Media harassment,
and Domestic abuse. Among these, the strongest connection is between Public
space/Street harassment and Domestic abuse and sexism in personal
relationships.The strength of the relationships between topics illustrates the
fluid and ubiquitous nature of sexism, with no single experience being
unrelated to another.Comment: preprint, under revie
Female scholars need to achieve more for equal public recognition
Different kinds of "gender gap" have been reported in different walks of the
scientific life, almost always favouring male scientists over females. In this
work, for the first time, we present a large-scale empirical analysis to ask
whether female scientists with the same level of scientific accomplishment are
as likely as males to be recognised. We particularly focus on Wikipedia, the
open online encyclopedia that its open nature allows us to have a proxy of
community recognition. We calculate the probability of appearing on Wikipedia
as a scientist for both male and female scholars in three different fields. We
find that women in Physics, Economics and Philosophy are considerable less
likely than men to be recognised on Wikipedia across all levels of achievement.Comment: Under revie
Understanding Communication Patterns in MOOCs: Combining Data Mining and qualitative methods
Massive Open Online Courses (MOOCs) offer unprecedented opportunities to
learn at scale. Within a few years, the phenomenon of crowd-based learning has
gained enormous popularity with millions of learners across the globe
participating in courses ranging from Popular Music to Astrophysics. They have
captured the imaginations of many, attracting significant media attention -
with The New York Times naming 2012 "The Year of the MOOC." For those engaged
in learning analytics and educational data mining, MOOCs have provided an
exciting opportunity to develop innovative methodologies that harness big data
in education.Comment: Preprint of a chapter to appear in "Data Mining and Learning
Analytics: Applications in Educational Research
Computational Courtship: Understanding the Evolution of Online Dating through Large-scale Data Analysis
Have we become more tolerant of dating people of different social backgrounds
compared to ten years ago? Has the rise of online dating exacerbated or
alleviated gender inequalities in modern courtship? Are the most attractive
people on these platforms necessarily the most successful? In this work, we
examine the mate preferences and communication patterns of male and female
users of the online dating site eHarmony over the past decade to identify how
attitudes and behaviors have changed over this time period. While other studies
have investigated disparities in user behavior between male and female users,
this study is unique in its longitudinal approach. Specifically, we analyze how
men and women differ in their preferences for certain traits in potential
partners and how those preferences have changed over time. The second line of
inquiry investigates to what extent physical attractiveness determines the rate
of messages a user receives, and how this relationship varies between men and
women. Thirdly, we explore whether online dating practices between males and
females have become more equal over time or if biases and inequalities have
remained constant (or increased). Fourthly, we study the behavioural traits in
sending and replying to messages based on one's own experience of receiving
messages and being replied to. Finally, we found that similarity between
profiles is not a predictor for success except for the number of children and
smoking habits. This work could have broader implications for shifting gender
norms and social attitudes, reflected in online courtship rituals. Apart from
the data-based research, we connect the results to existing theories that
concern the role of ICTs in societal change. As searching for love online
becomes increasingly common across generations and geographies, these findings
may shed light on how people can build relationships through the Internet.Comment: Preprint, under revie
Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data
Use of socially generated "big data" to access information about collective
states of the minds in human societies has become a new paradigm in the
emerging field of computational social science. A natural application of this
would be the prediction of the society's reaction to a new product in the sense
of popularity and adoption rate. However, bridging the gap between "real time
monitoring" and "early predicting" remains a big challenge. Here we report on
an endeavor to build a minimalistic predictive model for the financial success
of movies based on collective activity data of online users. We show that the
popularity of a movie can be predicted much before its release by measuring and
analyzing the activity level of editors and viewers of the corresponding entry
to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the
dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi
The most controversial topics in Wikipedia: A multilingual and geographical analysis
We present, visualize and analyse the similarities and differences between
the controversial topics related to "edit wars" identified in 10 different
language versions of Wikipedia. After a brief review of the related work we
describe the methods developed to locate, measure, and categorize the
controversial topics in the different languages. Visualizations of the degree
of overlap between the top 100 lists of most controversial articles in
different languages and the content related to geographical locations will be
presented. We discuss what the presented analysis and visualizations can tell
us about the multicultural aspects of Wikipedia and practices of
peer-production. Our results indicate that Wikipedia is more than just an
encyclopaedia; it is also a window into convergent and divergent social-spatial
priorities, interests and preferences.Comment: This is a draft of a book chapter to be published in 2014 by
Scarecrow Press. Please cite as: Yasseri T., Spoerri A., Graham M., and
Kert\'esz J., The most controversial topics in Wikipedia: A multilingual and
geographical analysis. In: Fichman P., Hara N., editors, Global
Wikipedia:International and cross-cultural issues in online collaboration.
Scarecrow Press (2014
Mining public opinion: why unsuccessful online petitions should not be ignored
Taha Yasseri argues that by analysing online petition data using computational techniques, politicians can glean fresh insights about the geographic factors influencing constituents’ concerns, the dynamics at play over time, as well as a deeper awareness of the issues most important to the general public
- …