4,909 research outputs found
Edit wars in Wikipedia
We present a new, efficient method for automatically detecting severe
conflicts `edit wars' in Wikipedia and evaluate this method on six different
language WPs. We discuss how the number of edits, reverts, the length of
discussions, the burstiness of edits and reverts deviate in such pages from
those following the general workflow, and argue that earlier work has
significantly over-estimated the contentiousness of the Wikipedia editing
process.Comment: 4 pages, 2 figures, 3 tables. The current version is shortened to be
published in SocialCom 201
Dynamics of conflicts in Wikipedia
In this work we study the dynamical features of editorial wars in Wikipedia
(WP). Based on our previously established algorithm, we build up samples of
controversial and peaceful articles and analyze the temporal characteristics of
the activity in these samples. On short time scales, we show that there is a
clear correspondence between conflict and burstiness of activity patterns, and
that memory effects play an important role in controversies. On long time
scales, we identify three distinct developmental patterns for the overall
behavior of the articles. We are able to distinguish cases eventually leading
to consensus from those cases where a compromise is far from achievable.
Finally, we analyze discussion networks and conclude that edit wars are mainly
fought by few editors only.Comment: Supporting information adde
The most controversial topics in Wikipedia: A multilingual and geographical analysis
We present, visualize and analyse the similarities and differences between
the controversial topics related to "edit wars" identified in 10 different
language versions of Wikipedia. After a brief review of the related work we
describe the methods developed to locate, measure, and categorize the
controversial topics in the different languages. Visualizations of the degree
of overlap between the top 100 lists of most controversial articles in
different languages and the content related to geographical locations will be
presented. We discuss what the presented analysis and visualizations can tell
us about the multicultural aspects of Wikipedia and practices of
peer-production. Our results indicate that Wikipedia is more than just an
encyclopaedia; it is also a window into convergent and divergent social-spatial
priorities, interests and preferences.Comment: This is a draft of a book chapter to be published in 2014 by
Scarecrow Press. Please cite as: Yasseri T., Spoerri A., Graham M., and
Kert\'esz J., The most controversial topics in Wikipedia: A multilingual and
geographical analysis. In: Fichman P., Hara N., editors, Global
Wikipedia:International and cross-cultural issues in online collaboration.
Scarecrow Press (2014
Are anonymity-seekers just like everybody else? An analysis of contributions to Wikipedia from Tor
User-generated content sites routinely block contributions from users of
privacy-enhancing proxies like Tor because of a perception that proxies are a
source of vandalism, spam, and abuse. Although these blocks might be effective,
collateral damage in the form of unrealized valuable contributions from
anonymity seekers is invisible. One of the largest and most important
user-generated content sites, Wikipedia, has attempted to block contributions
from Tor users since as early as 2005. We demonstrate that these blocks have
been imperfect and that thousands of attempts to edit on Wikipedia through Tor
have been successful. We draw upon several data sources and analytical
techniques to measure and describe the history of Tor editing on Wikipedia over
time and to compare contributions from Tor users to those from other groups of
Wikipedia users. Our analysis suggests that although Tor users who slip through
Wikipedia's ban contribute content that is more likely to be reverted and to
revert others, their contributions are otherwise similar in quality to those
from other unregistered participants and to the initial contributions of
registered users.Comment: To appear in the IEEE Symposium on Security & Privacy, May 202
Conflict and Computation on Wikipedia: a Finite-State Machine Analysis of Editor Interactions
What is the boundary between a vigorous argument and a breakdown of
relations? What drives a group of individuals across it? Taking Wikipedia as a
test case, we use a hidden Markov model to approximate the computational
structure and social grammar of more than a decade of cooperation and conflict
among its editors. Across a wide range of pages, we discover a bursty war/peace
structure where the systems can become trapped, sometimes for months, in a
computational subspace associated with significantly higher levels of
conflict-tracking "revert" actions. Distinct patterns of behavior characterize
the lower-conflict subspace, including tit-for-tat reversion. While a fraction
of the transitions between these subspaces are associated with top-down actions
taken by administrators, the effects are weak. Surprisingly, we find no
statistical signal that transitions are associated with the appearance of
particularly anti-social users, and only weak association with significant news
events outside the system. These findings are consistent with transitions being
driven by decentralized processes with no clear locus of control. Models of
belief revision in the presence of a common resource for information-sharing
predict the existence of two distinct phases: a disordered high-conflict phase,
and a frozen phase with spontaneously-broken symmetry. The bistability we
observe empirically may be a consequence of editor turn-over, which drives the
system to a critical point between them.Comment: 23 pages, 3 figures. Matches published version. Code for HMM fitting
available at http://bit.ly/sfihmm ; time series and derived finite state
machines at bit.ly/wiki_hm
Circadian patterns of Wikipedia editorial activity: A demographic analysis
Wikipedia (WP) as a collaborative, dynamical system of humans is an
appropriate subject of social studies. Each single action of the members of
this society, i.e. editors, is well recorded and accessible. Using the
cumulative data of 34 Wikipedias in different languages, we try to characterize
and find the universalities and differences in temporal activity patterns of
editors. Based on this data, we estimate the geographical distribution of
editors for each WP in the globe. Furthermore we also clarify the differences
among different groups of WPs, which originate in the variance of cultural and
social features of the communities of editors
Wikipedia as an encyclopaedia of life
In his 2003 essay E O Wilson outlined his vision for an “encyclopaedia of life” comprising “an electronic page for each species of organism on Earth”, each page containing “the scientific name of the species, a pictorial or genomic presentation of the primary type specimen on which its name is based, and a summary of its diagnostic traits.” Although the “quiet revolution” in biodiversity informatics has generated numerous online resources, including some directly inspired by Wilson's essay (e.g., "http://ispecies.org":http://ispecies.org, "http://www.eol.org":http://www.eol.org), we are still some way from the goal of having available online all relevant information about a species, such as its taxonomy, evolutionary history, genomics, morphology, ecology, and behaviour. While the biodiversity community has been developing a plethora of databases, some with overlapping goals and duplicated content, Wikipedia has been slowly growing to the point where it now has over 100,000 pages on biological taxa. My goal in this essay is to explore the idea that, largely independent of the efforts of biodiversity informatics and well-funded international efforts, Wikipedia ("http://en.wikipedia.org/wiki/Main_Page":http://en.wikipedia.org/wiki/Main_Page) has emerged as potentially the best platform for fulfilling E O Wilson’s vision
VoG: Summarizing and Understanding Large Graphs
How can we succinctly describe a million-node graph with a few simple
sentences? How can we measure the "importance" of a set of discovered subgraphs
in a large graph? These are exactly the problems we focus on. Our main ideas
are to construct a "vocabulary" of subgraph-types that often occur in real
graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the
most succinct description of a graph in terms of this vocabulary. We measure
success in a well-founded way by means of the Minimum Description Length (MDL)
principle: a subgraph is included in the summary if it decreases the total
description length of the graph.
Our contributions are three-fold: (a) formulation: we provide a principled
encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop
\method, an efficient method to minimize the description cost, and (c)
applicability: we report experimental results on multi-million-edge real
graphs, including Flickr and the Notre Dame web graph.Comment: SIAM International Conference on Data Mining (SDM) 201
- …