Search CORE

464 research outputs found

Can electoral popularity be predicted using socially generated big data?

Author: Bright Jonathan
Yasseri Taha
Publication venue
Publication date: 01/01/2014
Field of study

Today, our more-than-ever digital lives leave significant footprints in cyberspace. Large scale collections of these socially generated footprints, often known as big data, could help us to re-investigate different aspects of our social collective behaviour in a quantitative framework. In this contribution we discuss one such possibility: the monitoring and predicting of popularity dynamics of candidates and parties through the analysis of socially generated data on the web during electoral campaigns. Such data offer considerable possibility for improving our awareness of popularity dynamics. However they also suffer from significant drawbacks in terms of representativeness and generalisability. In this paper we discuss potential ways around such problems, suggesting the nature of different political systems and contexts might lend differing levels of predictive power to certain types of data source. We offer an initial exploratory test of these ideas, focussing on two data streams, Wikipedia page views and Google search queries. On the basis of this data, we present popularity dynamics from real case examples of recent elections in three different countries.Comment: To appear in Information Technolog

arXiv.org e-Print Archive

Oxford University Research Archive

The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics

Author: Samoilenko Anna
Yasseri Taha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/12/2013
Field of study

Activity of modern scholarship creates online footprints galore. Along with traditional metrics of research quality, such as citation counts, online images of researchers and institutions increasingly matter in evaluating academic impact, decisions about grant allocation, and promotion. We examined 400 biographical Wikipedia articles on academics from four scientific fields to test if being featured in the world's largest online encyclopedia is correlated with higher academic notability (assessed through citation counts). We found no statistically significant correlation between Wikipedia articles metrics (length, number of edits, number of incoming links from other articles, etc.) and academic notability of the mentioned researchers. We also did not find any evidence that the scientists with better WP representation are necessarily more prominent in their fields. In addition, we inspected the Wikipedia coverage of notable scientists sampled from Thomson Reuters list of "highly cited researchers". In each of the examined fields, Wikipedia failed in covering notable scholars properly. Both findings imply that Wikipedia might be producing an inaccurate image of academics on the front end of science. By shedding light on how public perception of academic progress is formed, this study alerts that a subjective element might have been introduced into the hitherto structured system of academic evaluation.Comment: To appear in EPJ Data Science. To have the Additional Files and Datasets e-mail the corresponding autho

arXiv.org e-Print Archive

Springer - Publisher Connector

Modeling the Rise in Internet-based Petitions

Author: Hale Scott A.
Margetts Helen
Yasseri Taha
Publication venue
Publication date: 14/08/2014
Field of study

Contemporary collective action, much of which involves social media and other Internet-based platforms, leaves a digital imprint which may be harvested to better understand the dynamics of mobilization. Petition signing is an example of collective action which has gained in popularity with rising use of social media and provides such data for the whole population of petition signatories for a given platform. This paper tracks the growth curves of all 20,000 petitions to the UK government over 18 months, analyzing the rate of growth and outreach mechanism. Previous research has suggested the importance of the first day to the ultimate success of a petition, but has not examined early growth within that day, made possible here through hourly resolution in the data. The analysis shows that the vast majority of petitions do not achieve any measure of success; over 99 percent fail to get the 10,000 signatures required for an official response and only 0.1 percent attain the 100,000 required for a parliamentary debate. We analyze the data through a multiplicative process model framework to explain the heterogeneous growth of signatures at the population level. We define and measure an average outreach factor for petitions and show that it decays very fast (reducing to 0.1% after 10 hours). After 24 hours, a petition's fate is virtually set. The findings seem to challenge conventional analyses of collective action from economics and political science, where the production function has been assumed to follow an S-shaped curve.Comment: Submitted to EPJ Data Scienc

arXiv.org e-Print Archive

Topic Modelling of Everyday Sexism Project Entries

Author: Eccles Kathryn
Melville Sophie
Yasseri Taha
Publication venue
Publication date: 05/04/2018
Field of study

The Everyday Sexism Project documents everyday examples of sexism reported by volunteer contributors from all around the world. It collected 100,000 entries in 13+ languages within the first 3 years of its existence. The content of reports in various languages submitted to Everyday Sexism is a valuable source of crowdsourced information with great potential for feminist and gender studies. In this paper, we take a computational approach to analyze the content of reports. We use topic-modelling techniques to extract emerging topics and concepts from the reports, and to map the semantic relations between those topics. The resulting picture closely resembles and adds to that arrived at through qualitative analysis, showing that this form of topic modeling could be useful for sifting through datasets that had not previously been subject to any analysis. More precisely, we come up with a map of topics for two different resolutions of our topic model and discuss the connection between the identified topics. In the low resolution picture, for instance, we found Public space/Street, Online, Work related/Office, Transport, School, Media harassment, and Domestic abuse. Among these, the strongest connection is between Public space/Street harassment and Domestic abuse and sexism in personal relationships.The strength of the relationships between topics illustrates the fluid and ubiquitous nature of sexism, with no single experience being unrelated to another.Comment: preprint, under revie

arXiv.org e-Print Archive

Oxford University Research Archive

Female scholars need to achieve more for equal public recognition

Author: Holstege Floris
Schellekens Menno H.
Yasseri Taha
Publication venue
Publication date: 01/01/2019
Field of study

Different kinds of "gender gap" have been reported in different walks of the scientific life, almost always favouring male scientists over females. In this work, for the first time, we present a large-scale empirical analysis to ask whether female scientists with the same level of scientific accomplishment are as likely as males to be recognised. We particularly focus on Wikipedia, the open online encyclopedia that its open nature allows us to have a proxy of community recognition. We calculate the probability of appearing on Wikipedia as a scientist for both male and female scholars in three different fields. We find that women in Physics, Economics and Philosophy are considerable less likely than men to be recognised on Wikipedia across all levels of achievement.Comment: Under revie

arXiv.org e-Print Archive

Oxford University Research Archive

Understanding Communication Patterns in MOOCs: Combining Data Mining and qualitative methods

Author: Eynon Rebecca
Gillani Nabeel
Hjorth Isis
Yasseri Taha
Publication venue
Publication date: 01/01/2016
Field of study

Massive Open Online Courses (MOOCs) offer unprecedented opportunities to learn at scale. Within a few years, the phenomenon of crowd-based learning has gained enormous popularity with millions of learners across the globe participating in courses ranging from Popular Music to Astrophysics. They have captured the imaginations of many, attracting significant media attention - with The New York Times naming 2012 "The Year of the MOOC." For those engaged in learning analytics and educational data mining, MOOCs have provided an exciting opportunity to develop innovative methodologies that harness big data in education.Comment: Preprint of a chapter to appear in "Data Mining and Learning Analytics: Applications in Educational Research

arXiv.org e-Print Archive

Oxford University Research Archive

Computational Courtship: Understanding the Evolution of Online Dating through Large-scale Data Analysis

Author: Blex Chris
Dinh Rachel
Gildersleve Patrick
Yasseri Taha
Publication venue
Publication date: 28/06/2020
Field of study

Have we become more tolerant of dating people of different social backgrounds compared to ten years ago? Has the rise of online dating exacerbated or alleviated gender inequalities in modern courtship? Are the most attractive people on these platforms necessarily the most successful? In this work, we examine the mate preferences and communication patterns of male and female users of the online dating site eHarmony over the past decade to identify how attitudes and behaviors have changed over this time period. While other studies have investigated disparities in user behavior between male and female users, this study is unique in its longitudinal approach. Specifically, we analyze how men and women differ in their preferences for certain traits in potential partners and how those preferences have changed over time. The second line of inquiry investigates to what extent physical attractiveness determines the rate of messages a user receives, and how this relationship varies between men and women. Thirdly, we explore whether online dating practices between males and females have become more equal over time or if biases and inequalities have remained constant (or increased). Fourthly, we study the behavioural traits in sending and replying to messages based on one's own experience of receiving messages and being replied to. Finally, we found that similarity between profiles is not a predictor for success except for the number of children and smoking habits. This work could have broader implications for shifting gender norms and social attitudes, reflected in online courtship rituals. Apart from the data-based research, we connect the results to existing theories that concern the role of ICTs in societal change. As searching for love online becomes increasingly common across generations and geographies, these findings may shed light on how people can build relationships through the Internet.Comment: Preprint, under revie

arXiv.org e-Print Archive

Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data

Author: A Halavais
A Ishii
A Spoerri
A Spoerri
Attila Szolnoki
B Suh
C Castillo
CA Hidalgo
G Eysenbach
HS Moat
J Bollen
J Ginsberg
J Ratkiewicz
J Török
János Kertész
Márton Mestyán
R Kimmons
R Sharda
RK Pan
S Saavedra
S Sinha
S Sreenivasan
T Brody
T Holloway
T Preis
T Preis
T Yasseri
T Yasseri
T Yasseri
T Yasseri
Taha Yasseri
X Shuai
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and "early predicting" remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Aaltodoc Publication Archive

Oxford University Research Archive

FigShare

The most controversial topics in Wikipedia: A multilingual and geographical analysis

Author: Graham Mark
Kertész János
Spoerri Anselm
Yasseri Taha
Publication venue
Publication date: 08/07/2013
Field of study

We present, visualize and analyse the similarities and differences between the controversial topics related to "edit wars" identified in 10 different language versions of Wikipedia. After a brief review of the related work we describe the methods developed to locate, measure, and categorize the controversial topics in the different languages. Visualizations of the degree of overlap between the top 100 lists of most controversial articles in different languages and the content related to geographical locations will be presented. We discuss what the presented analysis and visualizations can tell us about the multicultural aspects of Wikipedia and practices of peer-production. Our results indicate that Wikipedia is more than just an encyclopaedia; it is also a window into convergent and divergent social-spatial priorities, interests and preferences.Comment: This is a draft of a book chapter to be published in 2014 by Scarecrow Press. Please cite as: Yasseri T., Spoerri A., Graham M., and Kert\'esz J., The most controversial topics in Wikipedia: A multilingual and geographical analysis. In: Fichman P., Hara N., editors, Global Wikipedia:International and cross-cultural issues in online collaboration. Scarecrow Press (2014

arXiv.org e-Print Archive

International Development Research Centre: IDRC Digital Library

Mining public opinion: why unsuccessful online petitions should not be ignored

Author: Yasseri Taha
Publication venue: London School of Economics and Political Science
Publication date: 04/08/2020
Field of study

Taha Yasseri argues that by analysing online petition data using computational techniques, politicians can glean fresh insights about the geographic factors influencing constituents’ concerns, the dynamics at play over time, as well as a deeper awareness of the issues most important to the general public

LSE Research Online