4 research outputs found
Can Who-Edits-What Predict Edit Survival?
As the number of contributors to online peer-production systems grows, it
becomes increasingly important to predict whether the edits that users make
will eventually be beneficial to the project. Existing solutions either rely on
a user reputation system or consist of a highly specialized predictor that is
tailored to a specific peer-production system. In this work, we explore a
different point in the solution space that goes beyond user reputation but does
not involve any content-based feature of the edits. We view each edit as a game
between the editor and the component of the project. We posit that the
probability that an edit is accepted is a function of the editor's skill, of
the difficulty of editing the component and of a user-component interaction
term. Our model is broadly applicable, as it only requires observing data about
who makes an edit, what the edit affects and whether the edit survives or not.
We apply our model on Wikipedia and the Linux kernel, two examples of
large-scale peer-production systems, and we seek to understand whether it can
effectively predict edit survival: in both cases, we provide a positive answer.
Our approach significantly outperforms those based solely on user reputation
and bridges the gap with specialized predictors that use content-based
features. It is simple to implement, computationally inexpensive, and in
addition it enables us to discover interesting structure in the data.Comment: Accepted at KDD 201
Analyzing 2.3 Million Maven Dependencies to Reveal an Essential Core in APIs
This paper addresses the following question: does a small, essential, core
set of API members emerges from the actual usage of the API by client
applications? To investigate this question, we study the 99 most popular
libraries available in Maven Central and the 865,560 client programs that
declare dependencies towards them, summing up to 2.3M dependencies. Our key
findings are as follows: 43.5% of the dependencies declared by the clients are
not used in the bytecode; all APIs contain a large part of rarely used types
and a few frequently used types, and the ratio varies according to the nature
of the API, its size and its design; we can systematically extract a reuse-core
from APIs that is sufficient to provide for most clients, the median size of
this subset is 17% of the API that can serve 83% of the clients. This study is
novel both in its scale and its findings about unused dependencies and the
reuse-core of APIs. Our results provide concrete insights to improve Maven's
build process with a mechanism to detect unused dependencies. They also support
the need to reduce the size of APIs to facilitate API learning and maintenance.Comment: 15 pages, 13 figures, 3 tables, 2 listing