20 research outputs found
Revisiting Process versus Product Metrics: a Large Scale Analysis
Numerous methods can build predictive models from software data. However,
what methods and conclusions should we endorse as we move from analytics
in-the-small (dealing with a handful of projects) to analytics in-the-large
(dealing with hundreds of projects)?
To answer this question, we recheck prior small-scale results (about process
versus product metrics for defect prediction and the granularity of metrics)
using 722,471 commits from 700 Github projects. We find that some analytics
in-the-small conclusions still hold when scaling up to analytics in-the-large.
For example, like prior work, we see that process metrics are better predictors
for defects than product metrics (best process/product-based learners
respectively achieve recalls of 98\%/44\% and AUCs of 95\%/54\%, median
values).
That said, we warn that it is unwise to trust metric importance results from
analytics in-the-small studies since those change dramatically when moving to
analytics in-the-large. Also, when reasoning in-the-large about hundreds of
projects, it is better to use predictions from multiple models (since single
model predictions can become confused and exhibit a high variance).Comment: 36 pages, 12 figures and 5 table
Is My Project's Truck Factor Low? Theoretical and Empirical Considerations About the Truck Factor Threshold
The Truck Factor is a simple way, proposed by the agile community, to measure the system's knowledge distribution in a team of developers. It can be used to highlight potential project problems due to the inadequate distribution of the system knowledge. Notwithstanding its relevance, only few studies investigated the Truck Factor and proposed ways to efficiently measure, evaluate and use it. In particular, the effective use of the Truck Factor is limited by the lack of reliable thresholds. In this preliminary paper, we present a theoretical model concerning the Truck Factor and, in particular, we investigate its use to define the maximum achievable Truck Factor value in a project. The relevance of such a value concerns the definition of a reliable threshold for the Truck Factor. Furthermore in the paper, we document an experiment in which we apply the proposed model to real software projects with the aim of comparing the maximum achievable value of the Truck Factor with the unique threshold proposed in literature. The preliminary outcome we achieved shows that the existing threshold has some limitations and problem
Evaluating Maintainability Prejudices with a Large-Scale Study of Open-Source Projects
Exaggeration or context changes can render maintainability experience into
prejudice. For example, JavaScript is often seen as least elegant language and
hence of lowest maintainability. Such prejudice should not guide decisions
without prior empirical validation. We formulated 10 hypotheses about
maintainability based on prejudices and test them in a large set of open-source
projects (6,897 GitHub repositories, 402 million lines, 5 programming
languages). We operationalize maintainability with five static analysis
metrics. We found that JavaScript code is not worse than other code, Java code
shows higher maintainability than C# code and C code has longer methods than
other code. The quality of interface documentation is better in Java code than
in other code. Code developed by teams is not of higher and large code bases
not of lower maintainability. Projects with high maintainability are not more
popular or more often forked. Overall, most hypotheses are not supported by
open-source data.Comment: 20 page
Rethinking Experiments in a Socio-Technical Perspective: The Case of Software Engineering
Experiments in computing share many characteristics with the traditional experimental method, but also present significant differences from a practical perspective, due to their aim at producing software artifacts and the central role played by human actors and organizations (e.g., programmers, project teams, software houses) involved in the software development process. By analyzing some of the most significant experiments in the subfield of software engineering, we aim at showing how the conceptual framework that supports experimental methodology in this context needs an extension in a socio-technical perspective
Test case prioritization using test case diversification and fault-proneness estimations
Context: Regression testing activities greatly reduce the risk of faulty
software release. However, the size of the test suites grows throughout the
development process, resulting in time-consuming execution of the test suite
and delayed feedback to the software development team. This has urged the need
for approaches such as test case prioritization (TCP) and test-suite reduction
to reach better results in case of limited resources. In this regard, proposing
approaches that use auxiliary sources of data such as bug history can be
interesting.
Objective: Our aim is to propose an approach for TCP that takes into account
test case coverage data, bug history, and test case diversification. To
evaluate this approach we study its performance on real-world open-source
projects.
Method: The bug history is used to estimate the fault-proneness of source
code areas. The diversification of test cases is preserved by incorporating
fault-proneness on a clustering-based approach scheme.
Results: The proposed methods are evaluated on datasets collected from the
development history of five real-world projects including 357 versions in
total. The experiments show that the proposed methods are superior to
coverage-based TCP methods.
Conclusion: The proposed approach shows that improvement of coverage-based
and fault-proneness based methods is possible by using a combination of
diversification and fault-proneness incorporation