20 research outputs found

    Revisiting Process versus Product Metrics: a Large Scale Analysis

    Full text link
    Numerous methods can build predictive models from software data. However, what methods and conclusions should we endorse as we move from analytics in-the-small (dealing with a handful of projects) to analytics in-the-large (dealing with hundreds of projects)? To answer this question, we recheck prior small-scale results (about process versus product metrics for defect prediction and the granularity of metrics) using 722,471 commits from 700 Github projects. We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the-large. For example, like prior work, we see that process metrics are better predictors for defects than product metrics (best process/product-based learners respectively achieve recalls of 98\%/44\% and AUCs of 95\%/54\%, median values). That said, we warn that it is unwise to trust metric importance results from analytics in-the-small studies since those change dramatically when moving to analytics in-the-large. Also, when reasoning in-the-large about hundreds of projects, it is better to use predictions from multiple models (since single model predictions can become confused and exhibit a high variance).Comment: 36 pages, 12 figures and 5 table

    Is My Project's Truck Factor Low? Theoretical and Empirical Considerations About the Truck Factor Threshold

    Get PDF
    The Truck Factor is a simple way, proposed by the agile community, to measure the system's knowledge distribution in a team of developers. It can be used to highlight potential project problems due to the inadequate distribution of the system knowledge. Notwithstanding its relevance, only few studies investigated the Truck Factor and proposed ways to efficiently measure, evaluate and use it. In particular, the effective use of the Truck Factor is limited by the lack of reliable thresholds. In this preliminary paper, we present a theoretical model concerning the Truck Factor and, in particular, we investigate its use to define the maximum achievable Truck Factor value in a project. The relevance of such a value concerns the definition of a reliable threshold for the Truck Factor. Furthermore in the paper, we document an experiment in which we apply the proposed model to real software projects with the aim of comparing the maximum achievable value of the Truck Factor with the unique threshold proposed in literature. The preliminary outcome we achieved shows that the existing threshold has some limitations and problem

    Evaluating Maintainability Prejudices with a Large-Scale Study of Open-Source Projects

    Full text link
    Exaggeration or context changes can render maintainability experience into prejudice. For example, JavaScript is often seen as least elegant language and hence of lowest maintainability. Such prejudice should not guide decisions without prior empirical validation. We formulated 10 hypotheses about maintainability based on prejudices and test them in a large set of open-source projects (6,897 GitHub repositories, 402 million lines, 5 programming languages). We operationalize maintainability with five static analysis metrics. We found that JavaScript code is not worse than other code, Java code shows higher maintainability than C# code and C code has longer methods than other code. The quality of interface documentation is better in Java code than in other code. Code developed by teams is not of higher and large code bases not of lower maintainability. Projects with high maintainability are not more popular or more often forked. Overall, most hypotheses are not supported by open-source data.Comment: 20 page

    Rethinking Experiments in a Socio-Technical Perspective: The Case of Software Engineering

    Get PDF
    Experiments in computing share many characteristics with the traditional experimental method, but also present significant differences from a practical perspective, due to their aim at producing software artifacts and the central role played by human actors and organizations (e.g., programmers, project teams, software houses) involved in the software development process. By analyzing some of the most significant experiments in the subfield of software engineering, we aim at showing how the conceptual framework that supports experimental methodology in this context needs an extension in a socio-technical perspective

    Towards identifying software project clusters with regard to defect prediction

    Full text link

    Test case prioritization using test case diversification and fault-proneness estimations

    Full text link
    Context: Regression testing activities greatly reduce the risk of faulty software release. However, the size of the test suites grows throughout the development process, resulting in time-consuming execution of the test suite and delayed feedback to the software development team. This has urged the need for approaches such as test case prioritization (TCP) and test-suite reduction to reach better results in case of limited resources. In this regard, proposing approaches that use auxiliary sources of data such as bug history can be interesting. Objective: Our aim is to propose an approach for TCP that takes into account test case coverage data, bug history, and test case diversification. To evaluate this approach we study its performance on real-world open-source projects. Method: The bug history is used to estimate the fault-proneness of source code areas. The diversification of test cases is preserved by incorporating fault-proneness on a clustering-based approach scheme. Results: The proposed methods are evaluated on datasets collected from the development history of five real-world projects including 357 versions in total. The experiments show that the proposed methods are superior to coverage-based TCP methods. Conclusion: The proposed approach shows that improvement of coverage-based and fault-proneness based methods is possible by using a combination of diversification and fault-proneness incorporation
    corecore