145 research outputs found
Can electoral popularity be predicted using socially generated big data?
Today, our more-than-ever digital lives leave significant footprints in
cyberspace. Large scale collections of these socially generated footprints,
often known as big data, could help us to re-investigate different aspects of
our social collective behaviour in a quantitative framework. In this
contribution we discuss one such possibility: the monitoring and predicting of
popularity dynamics of candidates and parties through the analysis of socially
generated data on the web during electoral campaigns. Such data offer
considerable possibility for improving our awareness of popularity dynamics.
However they also suffer from significant drawbacks in terms of
representativeness and generalisability. In this paper we discuss potential
ways around such problems, suggesting the nature of different political systems
and contexts might lend differing levels of predictive power to certain types
of data source. We offer an initial exploratory test of these ideas, focussing
on two data streams, Wikipedia page views and Google search queries. On the
basis of this data, we present popularity dynamics from real case examples of
recent elections in three different countries.Comment: To appear in Information Technolog
The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics
Activity of modern scholarship creates online footprints galore. Along with
traditional metrics of research quality, such as citation counts, online images
of researchers and institutions increasingly matter in evaluating academic
impact, decisions about grant allocation, and promotion. We examined 400
biographical Wikipedia articles on academics from four scientific fields to
test if being featured in the world's largest online encyclopedia is correlated
with higher academic notability (assessed through citation counts). We found no
statistically significant correlation between Wikipedia articles metrics
(length, number of edits, number of incoming links from other articles, etc.)
and academic notability of the mentioned researchers. We also did not find any
evidence that the scientists with better WP representation are necessarily more
prominent in their fields. In addition, we inspected the Wikipedia coverage of
notable scientists sampled from Thomson Reuters list of "highly cited
researchers". In each of the examined fields, Wikipedia failed in covering
notable scholars properly. Both findings imply that Wikipedia might be
producing an inaccurate image of academics on the front end of science. By
shedding light on how public perception of academic progress is formed, this
study alerts that a subjective element might have been introduced into the
hitherto structured system of academic evaluation.Comment: To appear in EPJ Data Science. To have the Additional Files and
Datasets e-mail the corresponding autho
Mining public opinion: why unsuccessful online petitions should not be ignored
Taha Yasseri argues that by analysing online petition data using computational techniques, politicians can glean fresh insights about the geographic factors influencing constituents’ concerns, the dynamics at play over time, as well as a deeper awareness of the issues most important to the general public
Modeling the Rise in Internet-based Petitions
Contemporary collective action, much of which involves social media and other
Internet-based platforms, leaves a digital imprint which may be harvested to
better understand the dynamics of mobilization. Petition signing is an example
of collective action which has gained in popularity with rising use of social
media and provides such data for the whole population of petition signatories
for a given platform. This paper tracks the growth curves of all 20,000
petitions to the UK government over 18 months, analyzing the rate of growth and
outreach mechanism. Previous research has suggested the importance of the first
day to the ultimate success of a petition, but has not examined early growth
within that day, made possible here through hourly resolution in the data. The
analysis shows that the vast majority of petitions do not achieve any measure
of success; over 99 percent fail to get the 10,000 signatures required for an
official response and only 0.1 percent attain the 100,000 required for a
parliamentary debate. We analyze the data through a multiplicative process
model framework to explain the heterogeneous growth of signatures at the
population level. We define and measure an average outreach factor for
petitions and show that it decays very fast (reducing to 0.1% after 10 hours).
After 24 hours, a petition's fate is virtually set. The findings seem to
challenge conventional analyses of collective action from economics and
political science, where the production function has been assumed to follow an
S-shaped curve.Comment: Submitted to EPJ Data Scienc
Topic Modelling of Everyday Sexism Project Entries
The Everyday Sexism Project documents everyday examples of sexism reported by
volunteer contributors from all around the world. It collected 100,000 entries
in 13+ languages within the first 3 years of its existence. The content of
reports in various languages submitted to Everyday Sexism is a valuable source
of crowdsourced information with great potential for feminist and gender
studies. In this paper, we take a computational approach to analyze the content
of reports. We use topic-modelling techniques to extract emerging topics and
concepts from the reports, and to map the semantic relations between those
topics. The resulting picture closely resembles and adds to that arrived at
through qualitative analysis, showing that this form of topic modeling could be
useful for sifting through datasets that had not previously been subject to any
analysis. More precisely, we come up with a map of topics for two different
resolutions of our topic model and discuss the connection between the
identified topics. In the low resolution picture, for instance, we found Public
space/Street, Online, Work related/Office, Transport, School, Media harassment,
and Domestic abuse. Among these, the strongest connection is between Public
space/Street harassment and Domestic abuse and sexism in personal
relationships.The strength of the relationships between topics illustrates the
fluid and ubiquitous nature of sexism, with no single experience being
unrelated to another.Comment: preprint, under revie
Female scholars need to achieve more for equal public recognition
Different kinds of "gender gap" have been reported in different walks of the
scientific life, almost always favouring male scientists over females. In this
work, for the first time, we present a large-scale empirical analysis to ask
whether female scientists with the same level of scientific accomplishment are
as likely as males to be recognised. We particularly focus on Wikipedia, the
open online encyclopedia that its open nature allows us to have a proxy of
community recognition. We calculate the probability of appearing on Wikipedia
as a scientist for both male and female scholars in three different fields. We
find that women in Physics, Economics and Philosophy are considerable less
likely than men to be recognised on Wikipedia across all levels of achievement.Comment: Under revie
Wikipedia traffic data and electoral prediction: towards theoretically informed models
This aim of this article is to explore the potential use of Wikipedia page
view data for predicting electoral results. Responding to previous critiques of
work using socially generated data to predict elections, which have argued that
these predictions take place without any understanding of the mechanism which
enables them, we first develop a theoretical model which highlights why people
might seek information online at election time, and how this activity might
relate to overall electoral outcomes, focussing especially on how different
types of parties such as new and established parties might generate different
information seeking patterns. We test this model on a novel dataset drawn from
a variety of countries in the 2009 and 2014 European Parliament elections. We
show that while Wikipedia offers little insight into absolute vote outcomes, it
offers a good information about changes in both overall turnout at elections
and in vote share for particular parties. These results are used to enhance
existing theories about the drivers of aggregate patterns in online information
seeking.Comment: submitted to EPJ Data Science. Additional File 1 available at
https://drive.google.com/open?id=0BxaGC-YCTO6SWkJhRXlrMVRYVl
Gender Imbalance and Spatiotemporal Patterns of Contributions to Citizen Science Projects: The Case of Zooniverse
Citizen Science is research undertaken by professional scientists and members of the public collaboratively. Despite numerous benefits of citizen science for both the advancement of science and the community of the citizen scientists, there is still no comprehensive knowledge of patterns of contributions, and the demography of contributors to citizen science projects. In this paper we provide a first overview of spatiotemporal and gender distribution of citizen science workforce by analyzing 54 million classifications contributed by more than 340 thousand citizen science volunteers from 198 countries to one of the largest online citizen science platforms, Zooniverse. First we report on the uneven geographical distribution of the citizen scientist and model the variations among countries based on the socio-economic conditions as well as the level of research investment in each country. Analyzing the temporal features of contributions, we report on high “burstiness” of participation instances as well as the leisurely nature of participation suggested by the time of the day that the citizen scientists were the most active. Finally, we discuss the gender imbalance among online citizen scientists (about 30% female) and compare it with other collaborative projects as well as the gender distribution in more formal scientific activities. Online citizen science projects need further attention from outside of the academic community, and our findings can help attract the attention of public and private stakeholders, as well as to inform the design of the platforms and science policy making processes
Understanding Communication Patterns in MOOCs: Combining Data Mining and qualitative methods
Massive Open Online Courses (MOOCs) offer unprecedented opportunities to
learn at scale. Within a few years, the phenomenon of crowd-based learning has
gained enormous popularity with millions of learners across the globe
participating in courses ranging from Popular Music to Astrophysics. They have
captured the imaginations of many, attracting significant media attention -
with The New York Times naming 2012 "The Year of the MOOC." For those engaged
in learning analytics and educational data mining, MOOCs have provided an
exciting opportunity to develop innovative methodologies that harness big data
in education.Comment: Preprint of a chapter to appear in "Data Mining and Learning
Analytics: Applications in Educational Research
Dissent and Rebellion in the House of Commons: A Social Network Analysis of Brexit-Related Divisions in the 57 Parliament
The British party system is known for its discipline and cohesion, but it
remains wedged on one issue: European integration. This was observed both in
the days of the EEC in the 1970s and the EU-Maastricht treaty in the 1990s;
This work aims to investigate whether this holds true in the Brexit era. We
utilise social network analysis to unpack the patterns of dissent and rebellion
among pairs of MPs. Using data from Hansard, we compute similarity scores
between pairs of MPs from June 2017 until April 2019 and visualise them in a
force-directed network. Comparing Brexit- and non-Brexit divisions, we analyse
whether patterns of voting similarity and polarity differ among pairs of MPs.
Our results show that Brexit causes a wedge in party politics, consistent to
what is observed in history.Comment: Preprint under revie
- …