Search CORE

43 research outputs found

A Graph-structured Dataset for Wikipedia Research

Author: Aspert Nicolas
Miz Volodymyr
Ricaud Benjamin
Vandergheynst Pierre
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/03/2019
Field of study

Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia to study language, social behavior, knowledge organization, and network theory. While being a scientific treasure, the large size of the dataset hinders pre-processing and may be a challenging obstacle for potential new studies. This issue is particularly acute in scientific domains where researchers may not be technically and data processing savvy. On one hand, the size of Wikipedia dumps is large. It makes the parsing and extraction of relevant information cumbersome. On the other hand, the API is straightforward to use but restricted to a relatively small number of requests. The middle ground is at the mesoscopic scale when researchers need a subset of Wikipedia ranging from thousands to hundreds of thousands of pages but there exists no efficient solution at this scale. In this work, we propose an efficient data structure to make requests and access subnetworks of Wikipedia pages and categories. We provide convenient tools for accessing and filtering viewership statistics or "pagecounts" of Wikipedia web pages. The dataset organization leverages principles of graph databases that allows rapid and intuitive access to subgraphs of Wikipedia articles and categories. The dataset and deployment guidelines are available on the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Temporal Wikipedia search by edits and linkage

Author: Benczúr András
Göbölös-Szabó Júlia
Publication venue
Publication date: 01/01/2013
Field of study

SZTAKI Publication Repository

Dynamics of conflicts in Wikipedia

Author: A Capocci
A Halavais
A Kittur
A Kittur
A Vázquez
AK Laird
AL Barabási
András Kornai
András Rung
Attila Szolnoki
B Adler
B Suh
BQ Vuong
D Laniado
D Laniado
DG Champernowne
DM Wilkinson
DW McDonald
F Ortega
F Tyers
FB Viegas
H Zha
J Giles
J Leskovec
J Ratkiewicz
J Ratkiewicz
J Ratkiewicz
J Schneider
J Voss
János Kertész
K Samson
K Smets
KI Goh
L Buriol
M Hu
M Karsai
M Potthast
M Strube
O Medelyan
P Massa
R Kimmons
R Sumi
R Sumi
RL Rivest
Robert Sumi
S Javanmardi
S Javanmardi
S Vajna
SKS Sharoff
SP Ponzetto
T Gowers
T Yasseri
T Yasseri
T Yasseri
Taha Yasseri
U Brandes
U Brandes
V Zlatić
V Zlatić
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory effects play an important role in controversies. On long time scales, we identify three distinct developmental patterns for the overall behavior of the articles. We are able to distinguish cases eventually leading to consensus from those cases where a compromise is far from achievable. Finally, we analyze discussion networks and conclude that edit wars are mainly fought by few editors only.Comment: Supporting information adde

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

SZTAKI Publication Repository

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

FigShare

Evolution of Wikipedia's Category Structure

Author: Akdag Salah Almila Alkim
Akdağ Salah Almila Alkım
Gao Cheng
Scharnhorst Andrea
Suchecki Krzysztof
Publication venue
Publication date: 01/01/2012
Field of study

Wikipedia, as a social phenomenon of collaborative knowledge creating, has been studied extensively from various points of views. The category system of Wikipedia, introduced in 2004, has attracted relatively little attention. In this study, we focus on the documentation of knowledge, and the transformation of this documentation with time. We take Wikipedia as a proxy for knowledge in general and its category system as an aspect of the structure of this knowledge. We investigate the evolution of the category structure of the English Wikipedia from its birth in 2004 to 2008. We treat the category system as if it is a hierarchical Knowledge Organization System, capturing the changes in the distributions of the top categories. We investigate how the clustering of articles, defined by the category system, matches the direct link network between the articles and show how it changes over time. We find the Wikipedia category network mostly stable, but with occasional reorganization. We show that the clustering matches the link structure quite well, except short periods preceding the reorganizations.Comment: Preprint of an article submitted for consideration in Advances in Complex Systems (2012) http://www.worldscinet.com/acs/, 19 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Sabanci University Research Database

Digital.CSIC

Are anonymity-seekers just like everybody else? An analysis of contributions to Wikipedia from Tor

Author: Champion Kaylea
Forte Andrea
Greenstadt Rachel
Hill Benjamin Mako
Tran Chau
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/02/2020
Field of study

User-generated content sites routinely block contributions from users of privacy-enhancing proxies like Tor because of a perception that proxies are a source of vandalism, spam, and abuse. Although these blocks might be effective, collateral damage in the form of unrealized valuable contributions from anonymity seekers is invisible. One of the largest and most important user-generated content sites, Wikipedia, has attempted to block contributions from Tor users since as early as 2005. We demonstrate that these blocks have been imperfect and that thousands of attempts to edit on Wikipedia through Tor have been successful. We draw upon several data sources and analytical techniques to measure and describe the history of Tor editing on Wikipedia over time and to compare contributions from Tor users to those from other groups of Wikipedia users. Our analysis suggests that although Tor users who slip through Wikipedia's ban contribute content that is more likely to be reverted and to revert others, their contributions are otherwise similar in quality to those from other unregistered participants and to the initial contributions of registered users.Comment: To appear in the IEEE Symposium on Security & Privacy, May 202

arXiv.org e-Print Archive

Crossref

Guided generation of pedagogical concept maps from the Wikipedia

Author: Lahti Lauri
Publication venue: Association for the Advancement of Computing in Education (AACE)
Publication date: 01/01/2009
Field of study

We propose a new method for guided generation of concept maps from open accessonline knowledge resources such as Wikies. Based on this method we have implemented aprototype extracting semantic relations from sentences surrounding hyperlinks in the Wikipedia’sarticles and letting a learner to create customized learning objects in real-time based oncollaborative recommendations considering her earlier knowledge. Open source modules enablepedagogically motivated exploration in Wiki spaces, corresponding to an intelligent tutoringsystem. The method extracted compact noun–verb–noun phrases, suggested for labeling arcsbetween nodes that were labeled with article titles. On average, 80 percent of these phrases wereuseful while their length was only 20 percent of the length of the original sentences. Experimentsindicate that even simple analysis algorithms can well support user-initiated information retrievaland building intuitive learning objects that follow the learner’s needs.Peer reviewe

Aaltodoc Publication Archive