Search CORE

6,540 research outputs found

Can Who-Edits-What Predict Edit Survival?

Author: Abadi Martín
Bradley Ralph Allan
Bronner Amit
Jiang Yujuan
van der Maaten Laurens
Welinder Peter
Whitehill Jacob
Yasseri Taha
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/07/2018
Field of study

As the number of contributors to online peer-production systems grows, it becomes increasingly important to predict whether the edits that users make will eventually be beneficial to the project. Existing solutions either rely on a user reputation system or consist of a highly specialized predictor that is tailored to a specific peer-production system. In this work, we explore a different point in the solution space that goes beyond user reputation but does not involve any content-based feature of the edits. We view each edit as a game between the editor and the component of the project. We posit that the probability that an edit is accepted is a function of the editor's skill, of the difficulty of editing the component and of a user-component interaction term. Our model is broadly applicable, as it only requires observing data about who makes an edit, what the edit affects and whether the edit survives or not. We apply our model on Wikipedia and the Linux kernel, two examples of large-scale peer-production systems, and we seek to understand whether it can effectively predict edit survival: in both cases, we provide a positive answer. Our approach significantly outperforms those based solely on user reputation and bridges the gap with specialized predictors that use content-based features. It is simple to implement, computationally inexpensive, and in addition it enables us to discover interesting structure in the data.Comment: Accepted at KDD 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Bilkent University Institutional Repository

Damage Detection and Mitigation in Open Collaboration Applications

Author: West Andrew Granville
Publication venue: ScholarlyCommons
Publication date: 01/01/2013
Field of study

Collaborative functionality is changing the way information is amassed, refined, and disseminated in online environments. A subclass of these systems characterized by open collaboration uniquely allow participants to *modify* content with low barriers-to-entry. A prominent example and our case study, English Wikipedia, exemplifies the vulnerabilities: 7%+ of its edits are blatantly unconstructive. Our measurement studies show this damage manifests in novel socio-technical forms, limiting the effectiveness of computational detection strategies from related domains. In turn this has made much mitigation the responsibility of a poorly organized and ill-routed human workforce. We aim to improve all facets of this incident response workflow. Complementing language based solutions we first develop content agnostic predictors of damage. We implicitly glean reputations for system entities and overcome sparse behavioral histories with a spatial reputation model combining evidence from multiple granularity. We also identify simple yet indicative metadata features that capture participatory dynamics and content maturation. When brought to bear over damage corpora our contributions: (1) advance benchmarks over a broad set of security issues ( vandalism ), (2) perform well in the first anti-spam specific approach, and (3) demonstrate their portability over diverse open collaboration use cases. Probabilities generated by our classifiers can also intelligently route human assets using prioritization schemes optimized for capture rate or impact minimization. Organizational primitives are introduced that improve workforce efficiency. The whole of these strategies are then implemented into a tool ( STiki ) that has been used to revert 350,000+ damaging instances from Wikipedia. These uses are analyzed to learn about human aspects of the edit review process, properties including scalability, motivation, and latency. Finally, we conclude by measuring practical impacts of work, discussing how to better integrate our solutions, and revealing outstanding vulnerabilities that speak to research challenges for open collaboration security

CiteSeerX

ScholarlyCommons@Penn

Recommended from our members

Finding Destinations in Search Engine Results

Author: Roehl Wesley S
Zach Florian
Publication venue: ScholarWorks@UMass Amherst
Publication date: 31/07/2009
Field of study

It is generally understood that information about products and services is essential in creating consumers’ perceptions and expectations towards tourism experiences. One of the channels potential tourists rely on is word-of-mouth, whose importance increased sharply since the rise of websites that allow tourists to share their experiences (consumer generated content). In this study we explore this issue by examining the prominence of one type of user generated content, Wikipedia, in destination search results. It was found that Wikipedia articles appear near the top of the list of retrieved results in nearly all of the top search engines. Implications are made regarding the use of Wikipedia articles to promote the destination

ScholarWorks@UMass Amherst

Liquid Journals: Knowledge Dissemination in the Web Era

Author: Baez Marcos
Birukou Aliaksandr
Casati Fabio
Marchese Maurizio
Publication venue
Publication date: 01/02/2010
Field of study

In this paper we redefine the notion of "scientific journal" to update it to the age of the Web. We explore the historical reasons behind the current journal model, and we show that this model is essentially the same today, even if the Web has made dissemination essentially free. We propose a notion of liquid and personal journals that evolve continuously in time and that are targeted to serve individuals or communities of arbitrarily small or large scales. The liquid journals provide "interesting" content, in the form of "scientific contributions" that are "related" to a certain paper, topic, or area, and that are posted (on their web site, repositories, traditional journals) by "inspiring" researchers. As such, the liquid journal separates the notion of "publishing" (which can be achieved by submitting to traditional peer review journals or just by posting content on the Web) from the appearance of contributions into the journals, which are essentially collections of content. In this paper we introduce the liquid journal model, and demonstrate through some examples its value to individuals and communities. Finally, we describe an architecture and a working prototype that implements the proposed model

Unitn-eprints Research

Replication is more than hitting the lottery twice

Author: Asendorpf Jens B
Conner Mark
De Fruyt Filip
De Houwer Jan
Denissen Jaap JA
Fiedler Klaus
Fiedler Susann
Funder David C
Kliegl Reinhold
Nosek Brian A
Perugini Marco
Roberts Brent W
Schmitt Manfred
van Aken Marcel AG
Weber Hannelore
Wicherts Jelte M
Publication venue
Publication date: 01/01/2013
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23

Wikipedia vandalism detection

Author: Mola Velasco Santiago Moisés
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 25/05/2012
Field of study

Wikipedia is an online encyclopedia that anyone can edit. The fact that there are almost no restrictions to contributing content is at the core of its success. However, it also attracts pranksters, lobbysts, spammers and other people who degradatesWikipedia's contents. One of the most frequent kind of damage is vandalism, which is defined as any bad faith attempt to damage Wikipedia's integrity. For some years, the Wikipedia community has been fighting vandalism using automatic detection systems. In this work, we develop one of such systems, which won the 1st International Competition on Wikipedia Vandalism Detection. This system consists of a feature set exploiting textual content of Wikipedia articles. We performed a study of different supervised classification algorithms for this task, concluding that ensemble methods such as Random Forest and LogitBoost are clearly superior. After that, we combine this system with two other leading approaches based on different kind of features: metadata analysis and reputation. This joint system obtains one of the best results reported in the literature. We also conclude that our approach is mostly language independent, so we can adapt it to languages other than English with minor changes.Mola Velasco, SM. (2011). Wikipedia vandalism detection. http://hdl.handle.net/10251/1587

RiuNet