43 research outputs found
A Graph-structured Dataset for Wikipedia Research
Wikipedia is a rich and invaluable source of information. Its central place
on the Web makes it a particularly interesting object of study for scientists.
Researchers from different domains used various complex datasets related to
Wikipedia to study language, social behavior, knowledge organization, and
network theory. While being a scientific treasure, the large size of the
dataset hinders pre-processing and may be a challenging obstacle for potential
new studies. This issue is particularly acute in scientific domains where
researchers may not be technically and data processing savvy. On one hand, the
size of Wikipedia dumps is large. It makes the parsing and extraction of
relevant information cumbersome. On the other hand, the API is straightforward
to use but restricted to a relatively small number of requests. The middle
ground is at the mesoscopic scale when researchers need a subset of Wikipedia
ranging from thousands to hundreds of thousands of pages but there exists no
efficient solution at this scale.
In this work, we propose an efficient data structure to make requests and
access subnetworks of Wikipedia pages and categories. We provide convenient
tools for accessing and filtering viewership statistics or "pagecounts" of
Wikipedia web pages. The dataset organization leverages principles of graph
databases that allows rapid and intuitive access to subgraphs of Wikipedia
articles and categories. The dataset and deployment guidelines are available on
the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}
Dynamics of conflicts in Wikipedia
In this work we study the dynamical features of editorial wars in Wikipedia
(WP). Based on our previously established algorithm, we build up samples of
controversial and peaceful articles and analyze the temporal characteristics of
the activity in these samples. On short time scales, we show that there is a
clear correspondence between conflict and burstiness of activity patterns, and
that memory effects play an important role in controversies. On long time
scales, we identify three distinct developmental patterns for the overall
behavior of the articles. We are able to distinguish cases eventually leading
to consensus from those cases where a compromise is far from achievable.
Finally, we analyze discussion networks and conclude that edit wars are mainly
fought by few editors only.Comment: Supporting information adde
Evolution of Wikipedia's Category Structure
Wikipedia, as a social phenomenon of collaborative knowledge creating, has
been studied extensively from various points of views. The category system of
Wikipedia, introduced in 2004, has attracted relatively little attention. In
this study, we focus on the documentation of knowledge, and the transformation
of this documentation with time. We take Wikipedia as a proxy for knowledge in
general and its category system as an aspect of the structure of this
knowledge. We investigate the evolution of the category structure of the
English Wikipedia from its birth in 2004 to 2008. We treat the category system
as if it is a hierarchical Knowledge Organization System, capturing the changes
in the distributions of the top categories. We investigate how the clustering
of articles, defined by the category system, matches the direct link network
between the articles and show how it changes over time. We find the Wikipedia
category network mostly stable, but with occasional reorganization. We show
that the clustering matches the link structure quite well, except short periods
preceding the reorganizations.Comment: Preprint of an article submitted for consideration in Advances in
Complex Systems (2012) http://www.worldscinet.com/acs/, 19 pages, 7 figure
Are anonymity-seekers just like everybody else? An analysis of contributions to Wikipedia from Tor
User-generated content sites routinely block contributions from users of
privacy-enhancing proxies like Tor because of a perception that proxies are a
source of vandalism, spam, and abuse. Although these blocks might be effective,
collateral damage in the form of unrealized valuable contributions from
anonymity seekers is invisible. One of the largest and most important
user-generated content sites, Wikipedia, has attempted to block contributions
from Tor users since as early as 2005. We demonstrate that these blocks have
been imperfect and that thousands of attempts to edit on Wikipedia through Tor
have been successful. We draw upon several data sources and analytical
techniques to measure and describe the history of Tor editing on Wikipedia over
time and to compare contributions from Tor users to those from other groups of
Wikipedia users. Our analysis suggests that although Tor users who slip through
Wikipedia's ban contribute content that is more likely to be reverted and to
revert others, their contributions are otherwise similar in quality to those
from other unregistered participants and to the initial contributions of
registered users.Comment: To appear in the IEEE Symposium on Security & Privacy, May 202
Guided generation of pedagogical concept maps from the Wikipedia
We propose a new method for guided generation of concept maps from open accessonline knowledge resources such as Wikies. Based on this method we have implemented aprototype extracting semantic relations from sentences surrounding hyperlinks in the Wikipedia’sarticles and letting a learner to create customized learning objects in real-time based oncollaborative recommendations considering her earlier knowledge. Open source modules enablepedagogically motivated exploration in Wiki spaces, corresponding to an intelligent tutoringsystem. The method extracted compact noun–verb–noun phrases, suggested for labeling arcsbetween nodes that were labeled with article titles. On average, 80 percent of these phrases wereuseful while their length was only 20 percent of the length of the original sentences. Experimentsindicate that even simple analysis algorithms can well support user-initiated information retrievaland building intuitive learning objects that follow the learner’s needs.Peer reviewe