Search CORE

8,948 research outputs found

Pushing Your Point of View: Behavioral Measures of Manipulation in Wikipedia

Author: Das Sanmay
Lavoie Allen
Magdon-Ismail Malik
Publication venue
Publication date: 01/01/2011
Field of study

As a major source for information on virtually any topic, Wikipedia serves an important role in public dissemination and consumption of knowledge. As a result, it presents tremendous potential for people to promulgate their own points of view; such efforts may be more subtle than typical vandalism. In this paper, we introduce new behavioral metrics to quantify the level of controversy associated with a particular user: a Controversy Score (C-Score) based on the amount of attention the user focuses on controversial pages, and a Clustered Controversy Score (CC-Score) that also takes into account topical clustering. We show that both these measures are useful for identifying people who try to "push" their points of view, by showing that they are good predictors of which editors get blocked. The metrics can be used to triage potential POV pushers. We apply this idea to a dataset of users who requested promotion to administrator status and easily identify some editors who significantly changed their behavior upon becoming administrators. At the same time, such behavior is not rampant. Those who are promoted to administrator status tend to have more stable behavior than comparable groups of prolific editors. This suggests that the Adminship process works well, and that the Wikipedia community is not overwhelmed by users who become administrators to promote their own points of view

arXiv.org e-Print Archive

CiteSeerX

Towards Better Understanding Researcher Strategies in Cross-Lingual Event Analytics

Author: Bernacchi Viola
Demidova Elena
Gottschalk Simon
Rogers Richard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

With an increasing amount of information on globally important events, there is a growing demand for efficient analytics of multilingual event-centric information. Such analytics is particularly challenging due to the large amount of content, the event dynamics and the language barrier. Although memory institutions increasingly collect event-centric Web content in different languages, very little is known about the strategies of researchers who conduct analytics of such content. In this paper we present researchers' strategies for the content, method and feature selection in the context of cross-lingual event-centric analytics observed in two case studies on multilingual Wikipedia. We discuss the influence factors for these strategies, the findings enabled by the adopted methods along with the current limitations and provide recommendations for services supporting researchers in cross-lingual event-centric analytics.Comment: In Proceedings of the International Conference on Theory and Practice of Digital Libraries 201

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

Distilling Information Reliability and Source Trustworthiness from Digital Traces

Author: Aalen O.
Daneshmand H.
De A.
Diamond S.
Du N.
Farajtabar M.
Farajtabar M.
Farajtabar M.
Gomez-Rodriguez M.
Gyöngyi Z.
Hunter D.
Liu X.
Wu M.
Zhao B.
Zhou K.
Řehůřek R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Online knowledge repositories typically rely on their users or dedicated editors to evaluate the reliability of their content. These evaluations can be viewed as noisy measurements of both information reliability and information source trustworthiness. Can we leverage these noisy evaluations, often biased, to distill a robust, unbiased and interpretable measure of both notions? In this paper, we argue that the temporal traces left by these noisy evaluations give cues on the reliability of the information and the trustworthiness of the sources. Then, we propose a temporal point process modeling framework that links these temporal traces to robust, unbiased and interpretable notions of information reliability and source trustworthiness. Furthermore, we develop an efficient convex optimization procedure to learn the parameters of the model from historical traces. Experiments on real-world data gathered from Wikipedia and Stack Overflow show that our modeling framework accurately predicts evaluation events, provides an interpretable measure of information reliability and source trustworthiness, and yields interesting insights about real-world events.Comment: Accepted at 26th World Wide Web conference (WWW-17

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

A practical approach to language complexity: a wikipedia case study

Author: A Halavais
A Kornai
A Mikheev
András Kornai
D van Leijenhorst
D Varga
E Gabrilovich
Eduardo G. Altmann
EG Altmann
EG Altmann
F Tweedie
GR Klare
JC Roberts
János Kertész
M Serrano
MD Besten
MK Paasche-Orlow
O Medelyan
R Baeza Yates
R Gunning
R Lambiotte
S Javanmardi
T Yasseri
T Yasseri
Taha Yasseri
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity

arXiv.org e-Print Archive

Crossref

SZTAKI Publication Repository

Directory of Open Access Journals

PubMed Central

FigShare

Dynamics of conflicts in Wikipedia

Author: A Capocci
A Halavais
A Kittur
A Kittur
A Vázquez
AK Laird
AL Barabási
András Kornai
András Rung
Attila Szolnoki
B Adler
B Suh
BQ Vuong
D Laniado
D Laniado
DG Champernowne
DM Wilkinson
DW McDonald
F Ortega
F Tyers
FB Viegas
H Zha
J Giles
J Leskovec
J Ratkiewicz
J Ratkiewicz
J Ratkiewicz
J Schneider
J Voss
János Kertész
K Samson
K Smets
KI Goh
L Buriol
M Hu
M Karsai
M Potthast
M Strube
O Medelyan
P Massa
R Kimmons
R Sumi
R Sumi
RL Rivest
Robert Sumi
S Javanmardi
S Javanmardi
S Vajna
SKS Sharoff
SP Ponzetto
T Gowers
T Yasseri
T Yasseri
T Yasseri
Taha Yasseri
U Brandes
U Brandes
V Zlatić
V Zlatić
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory effects play an important role in controversies. On long time scales, we identify three distinct developmental patterns for the overall behavior of the articles. We are able to distinguish cases eventually leading to consensus from those cases where a compromise is far from achievable. Finally, we analyze discussion networks and conclude that edit wars are mainly fought by few editors only.Comment: Supporting information adde

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

SZTAKI Publication Repository

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

FigShare

Exploring the Relationship between Membership Turnover and Productivity in Online Communities

Author: Cunningham Pádraig
Qin Xiangju
Salter-Townshend Michael
Publication venue
Publication date: 30/01/2014
Field of study

One of the more disruptive reforms associated with the modern Internet is the emergence of online communities working together on knowledge artefacts such as Wikipedia and OpenStreetMap. Recently it has become clear that these initiatives are vulnerable because of problems with membership turnover. This study presents a longitudinal analysis of 891 WikiProjects where we model the impact of member turnover and social capital losses on project productivity. By examining social capital losses we attempt to provide a more nuanced analysis of member turnover. In this context social capital is modelled from a social network perspective where the loss of more central members has more impact. We find that only a small proportion of WikiProjects are in a relatively healthy state with low levels of membership turnover and social capital losses. The results show that the relationship between social capital losses and project performance is U-shaped, and that member withdrawal has significant negative effect on project outcomes. The results also support the mediation of turnover rate and network density on the curvilinear relationship

arXiv.org e-Print Archive

Research Repository UCD

Irish Universities

Association for the Advancement of Artificial Intelligence: AAAI Publications

Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data

Author: A Halavais
A Ishii
A Spoerri
A Spoerri
Attila Szolnoki
B Suh
C Castillo
CA Hidalgo
G Eysenbach
HS Moat
J Bollen
J Ginsberg
J Ratkiewicz
J Török
János Kertész
Márton Mestyán
R Kimmons
R Sharda
RK Pan
S Saavedra
S Sinha
S Sreenivasan
T Brody
T Holloway
T Preis
T Preis
T Yasseri
T Yasseri
T Yasseri
T Yasseri
Taha Yasseri
X Shuai
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and "early predicting" remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Aaltodoc Publication Archive

Oxford University Research Archive

FigShare

Argumentation Mining in User-Generated Web Discourse

Author: Gurevych Iryna
Habernal Ivan
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2015
Field of study

The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

arXiv.org e-Print Archive

TUbiblio

Crossref

Directory of Open Access Journals

TUdatalib Repository (TU Darmstadt)