10,927 research outputs found
Bayesian Hierarchical Modelling for Tailoring Metric Thresholds
Software is highly contextual. While there are cross-cutting `global'
lessons, individual software projects exhibit many `local' properties. This
data heterogeneity makes drawing local conclusions from global data dangerous.
A key research challenge is to construct locally accurate prediction models
that are informed by global characteristics and data volumes. Previous work has
tackled this problem using clustering and transfer learning approaches, which
identify locally similar characteristics. This paper applies a simpler approach
known as Bayesian hierarchical modeling. We show that hierarchical modeling
supports cross-project comparisons, while preserving local context. To
demonstrate the approach, we conduct a conceptual replication of an existing
study on setting software metrics thresholds. Our emerging results show our
hierarchical model reduces model prediction error compared to a global approach
by up to 50%.Comment: Short paper, published at MSR '18: 15th International Conference on
Mining Software Repositories May 28--29, 2018, Gothenburg, Swede
Identifying Unmaintained Projects in GitHub
Background: Open source software has an increasing importance in modern
software development. However, there is also a growing concern on the
sustainability of such projects, which are usually managed by a small number of
developers, frequently working as volunteers. Aims: In this paper, we propose
an approach to identify GitHub projects that are not actively maintained. Our
goal is to alert users about the risks of using these projects and possibly
motivate other developers to assume the maintenance of the projects. Method: We
train machine learning models to identify unmaintained or sparsely maintained
projects, based on a set of features about project activity (commits, forks,
issues, etc). We empirically validate the model with the best performance with
the principal developers of 129 GitHub projects. Results: The proposed machine
learning approach has a precision of 80%, based on the feedback of real open
source developers; and a recall of 96%. We also show that our approach can be
used to assess the risks of projects becoming unmaintained. Conclusions: The
model proposed in this paper can be used by open source users and developers to
identify GitHub projects that are not actively maintained anymore.Comment: Accepted at 12th International Symposium on Empirical Software
Engineering and Measurement (ESEM), 10 pages, 201
A Second Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems
Background. Software engineering is in search for general principles that apply across contexts, for example to help guide software quality assurance. Fenton and Ohlsson presented such observations on fault distributions, which have been replicated once. Objectives.We intend to replicate their study a second time in a new environment. Method.We conducted a close replication, collecting defect data from five consecutive releases of a large software system in the telecommunications domain, and conducted the same analysis as in the original study. Results. The replication confirms results on un-evenly distributed faults over modules, and that fault proneness distribution persist over test phases. Size measures are not useful as predictors of fault proneness, while fault densities are of the same order of magnitude across releases and contexts. Conclusions. This replication confirms that the un-even distribution of defects motivates un-even distribution of quality assurance efforts, although predictors for such distribution of efforts are not sufficiently precise
- …