6,540 research outputs found
Can Who-Edits-What Predict Edit Survival?
As the number of contributors to online peer-production systems grows, it
becomes increasingly important to predict whether the edits that users make
will eventually be beneficial to the project. Existing solutions either rely on
a user reputation system or consist of a highly specialized predictor that is
tailored to a specific peer-production system. In this work, we explore a
different point in the solution space that goes beyond user reputation but does
not involve any content-based feature of the edits. We view each edit as a game
between the editor and the component of the project. We posit that the
probability that an edit is accepted is a function of the editor's skill, of
the difficulty of editing the component and of a user-component interaction
term. Our model is broadly applicable, as it only requires observing data about
who makes an edit, what the edit affects and whether the edit survives or not.
We apply our model on Wikipedia and the Linux kernel, two examples of
large-scale peer-production systems, and we seek to understand whether it can
effectively predict edit survival: in both cases, we provide a positive answer.
Our approach significantly outperforms those based solely on user reputation
and bridges the gap with specialized predictors that use content-based
features. It is simple to implement, computationally inexpensive, and in
addition it enables us to discover interesting structure in the data.Comment: Accepted at KDD 201
Damage Detection and Mitigation in Open Collaboration Applications
Collaborative functionality is changing the way information is amassed, refined, and disseminated in online environments. A subclass of these systems characterized by open collaboration uniquely allow participants to *modify* content with low barriers-to-entry. A prominent example and our case study, English Wikipedia, exemplifies the vulnerabilities: 7%+ of its edits are blatantly unconstructive. Our measurement studies show this damage manifests in novel socio-technical forms, limiting the effectiveness of computational detection strategies from related domains. In turn this has made much mitigation the responsibility of a poorly organized and ill-routed human workforce. We aim to improve all facets of this incident response workflow.
Complementing language based solutions we first develop content agnostic predictors of damage. We implicitly glean reputations for system entities and overcome sparse behavioral histories with a spatial reputation model combining evidence from multiple granularity. We also identify simple yet indicative metadata features that capture participatory dynamics and content maturation. When brought to bear over damage corpora our contributions: (1) advance benchmarks over a broad set of security issues ( vandalism ), (2) perform well in the first anti-spam specific approach, and (3) demonstrate their portability over diverse open collaboration use cases.
Probabilities generated by our classifiers can also intelligently route human assets using prioritization schemes optimized for capture rate or impact minimization. Organizational primitives are introduced that improve workforce efficiency. The whole of these strategies are then implemented into a tool ( STiki ) that has been used to revert 350,000+ damaging instances from Wikipedia. These uses are analyzed to learn about human aspects of the edit review process, properties including scalability, motivation, and latency. Finally, we conclude by measuring practical impacts of work, discussing how to better integrate our solutions, and revealing outstanding vulnerabilities that speak to research challenges for open collaboration security
Recommended from our members
Finding Destinations in Search Engine Results
It is generally understood that information about products and services is essential in creating consumers’ perceptions and expectations towards tourism experiences. One of the channels potential tourists rely on is word-of-mouth, whose importance increased sharply since the rise of websites that allow tourists to share their experiences (consumer generated content). In this study we explore this issue by examining the prominence of one type of user generated content, Wikipedia, in destination search results. It was found that Wikipedia articles appear near the top of the list of retrieved results in nearly all of the top search engines. Implications are made regarding the use of Wikipedia articles to promote the destination
Liquid Journals: Knowledge Dissemination in the Web Era
In this paper we redefine the notion of "scientific journal" to update it to the age of the Web. We explore the historical reasons behind the current journal model, and we show that this model is essentially the same today, even if the Web has made dissemination essentially free. We propose a notion of liquid and personal journals that evolve continuously in time and that are targeted to serve individuals or communities of arbitrarily small or large scales. The liquid journals provide "interesting" content, in the form of "scientific contributions" that are "related" to a certain paper, topic, or area, and that are posted (on their web site, repositories, traditional journals) by "inspiring" researchers. As such, the liquid journal separates the notion of "publishing" (which can be achieved by submitting to traditional peer review journals or just by posting content on the Web) from the appearance of contributions into the journals, which are essentially collections of content. In this paper we introduce the liquid journal model, and demonstrate through some examples its value to individuals and communities. Finally, we describe an architecture and a working prototype that implements the proposed model
Wikipedia vandalism detection
Wikipedia is an online encyclopedia that anyone can edit. The fact that
there are almost no restrictions to contributing content is at the core of its
success. However, it also attracts pranksters, lobbysts, spammers and other
people who degradatesWikipedia's contents. One of the most frequent kind
of damage is vandalism, which is defined as any bad faith attempt to damage
Wikipedia's integrity.
For some years, the Wikipedia community has been fighting vandalism
using automatic detection systems. In this work, we develop one of such
systems, which won the 1st International Competition on Wikipedia Vandalism
Detection. This system consists of a feature set exploiting textual
content of Wikipedia articles. We performed a study of different supervised
classification algorithms for this task, concluding that ensemble methods
such as Random Forest and LogitBoost are clearly superior.
After that, we combine this system with two other leading approaches
based on different kind of features: metadata analysis and reputation. This
joint system obtains one of the best results reported in the literature. We
also conclude that our approach is mostly language independent, so we can
adapt it to languages other than English with minor changes.Mola Velasco, SM. (2011). Wikipedia vandalism detection. http://hdl.handle.net/10251/1587
- …