4 research outputs found

    Can Who-Edits-What Predict Edit Survival?

    Get PDF
    As the number of contributors to online peer-production systems grows, it becomes increasingly important to predict whether the edits that users make will eventually be beneficial to the project. Existing solutions either rely on a user reputation system or consist of a highly specialized predictor that is tailored to a specific peer-production system. In this work, we explore a different point in the solution space that goes beyond user reputation but does not involve any content-based feature of the edits. We view each edit as a game between the editor and the component of the project. We posit that the probability that an edit is accepted is a function of the editor's skill, of the difficulty of editing the component and of a user-component interaction term. Our model is broadly applicable, as it only requires observing data about who makes an edit, what the edit affects and whether the edit survives or not. We apply our model on Wikipedia and the Linux kernel, two examples of large-scale peer-production systems, and we seek to understand whether it can effectively predict edit survival: in both cases, we provide a positive answer. Our approach significantly outperforms those based solely on user reputation and bridges the gap with specialized predictors that use content-based features. It is simple to implement, computationally inexpensive, and in addition it enables us to discover interesting structure in the data.Comment: Accepted at KDD 201

    INVocD: Identifier name vocabulary dataset

    Full text link

    An exploratory analysis of mobile development issues using stack overflow

    Full text link

    Analyzing 2.3 Million Maven Dependencies to Reveal an Essential Core in APIs

    Full text link
    This paper addresses the following question: does a small, essential, core set of API members emerges from the actual usage of the API by client applications? To investigate this question, we study the 99 most popular libraries available in Maven Central and the 865,560 client programs that declare dependencies towards them, summing up to 2.3M dependencies. Our key findings are as follows: 43.5% of the dependencies declared by the clients are not used in the bytecode; all APIs contain a large part of rarely used types and a few frequently used types, and the ratio varies according to the nature of the API, its size and its design; we can systematically extract a reuse-core from APIs that is sufficient to provide for most clients, the median size of this subset is 17% of the API that can serve 83% of the clients. This study is novel both in its scale and its findings about unused dependencies and the reuse-core of APIs. Our results provide concrete insights to improve Maven's build process with a mechanism to detect unused dependencies. They also support the need to reduce the size of APIs to facilitate API learning and maintenance.Comment: 15 pages, 13 figures, 3 tables, 2 listing
    corecore