5,889 research outputs found

    MEG: Multi-objective Ensemble Generation for Software Defect Prediction

    Get PDF
    Background: Defect Prediction research aims at assisting software engineers in the early identification of software defect during the development process. A variety of automated approaches, ranging from traditional classification models to more sophisticated learning approaches, have been explored to this end. Among these, recent studies have proposed the use of ensemble prediction models (i.e., aggregation of multiple base classifiers) to build more robust defect prediction models. / Aims: In this paper, we introduce a novel approach based on multi-objective evolutionary search to automatically generate defect prediction ensembles. Our proposal is not only novel with respect to the more general area of evolutionary generation of ensembles, but it also advances the state-of-the-art in the use of ensemble in defect prediction. / Method: We assess the effectiveness of our approach, dubbed as Multi-objective Ensemble Generation (MEG), by empirically benchmarking it with respect to the most related proposals we found in the literature on defect prediction ensembles and on multi-objective evolutionary ensembles (which, to the best of our knowledge, had never been previously applied to tackle defect prediction). / Result: Our results show that MEG is able to generate ensembles which produce similar or more accurate predictions than those achieved by all the other approaches considered in 73% of the cases (with favourable large effect sizes in 80% of them). / Conclusions: MEG is not only able to generate ensembles that yield more accurate defect predictions with respect to the benchmarks considered, but it also does it automatically, thus relieving the engineers from the burden of manual design and experimentation

    A COUPLING AND COHESION METRICS SUITE FOR

    Get PDF
    The increasing need for software quality measurements has led to extensive research into software metrics and the development of software metric tools. To maintain high quality software, developers need to strive for a low-coupled and highly cohesive design. One of many properties considered when measuring coupling and cohesion is the type of relationships that made up coupling and cohesion. What these specific relationships are is widely understood and accepted by researchers and practitioners. However, different researchers base their metrics on a different subset of these relationships. Studies have shown that because of the inclusion of multiple subsets of relationships in one measure of coupling and cohesion metrics, the measures tend to correlate among each other. Validation of these metrics against maintainability index of a Java program suggested that there is high multicollinearity among coupling and cohesion metrics. This research introduces an approach of implementing coupling and cohesion metrics. Every possible relationship is considered and, for each, addressed the issue of whether or not it has significant effect on maintainability index prediction. Validation of orthogonality of the selected metrics is assessed by means of principal component analysis. The investigation suggested that some of the metrics are independent set of metrics, while some are measuring similar dimension

    The impact of using biased performance metrics on software defect prediction research

    Full text link
    Context: Software engineering researchers have undertaken many experiments investigating the potential of software defect prediction algorithms. Unfortunately, some widely used performance metrics are known to be problematic, most notably F1, but nevertheless F1 is widely used. Objective: To investigate the potential impact of using F1 on the validity of this large body of research. Method: We undertook a systematic review to locate relevant experiments and then extract all pairwise comparisons of defect prediction performance using F1 and the un-biased Matthews correlation coefficient (MCC). Results: We found a total of 38 primary studies. These contain 12,471 pairs of results. Of these, 21.95% changed direction when the MCC metric is used instead of the biased F1 metric. Unfortunately, we also found evidence suggesting that F1 remains widely used in software defect prediction research. Conclusions: We reiterate the concerns of statisticians that the F1 is a problematic metric outside of an information retrieval context, since we are concerned about both classes (defect-prone and not defect-prone units). This inappropriate usage has led to a substantial number (more than one fifth) of erroneous (in terms of direction) results. Therefore we urge researchers to (i) use an unbiased metric and (ii) publish detailed results including confusion matrices such that alternative analyses become possible.Comment: Submitted to the journal Information & Software Technology. It is a greatly extended version of "Assessing Software Defection Prediction Performance: Why Using the Matthews Correlation Coefficient Matters" presented at EASE 202

    Is One Hyperparameter Optimizer Enough?

    Full text link
    Hyperparameter tuning is the black art of automatically finding a good combination of control parameters for a data miner. While widely applied in empirical Software Engineering, there has not been much discussion on which hyperparameter tuner is best for software analytics. To address this gap in the literature, this paper applied a range of hyperparameter optimizers (grid search, random search, differential evolution, and Bayesian optimization) to defect prediction problem. Surprisingly, no hyperparameter optimizer was observed to be `best' and, for one of the two evaluation measures studied here (F-measure), hyperparameter optimization, in 50\% cases, was no better than using default configurations. We conclude that hyperparameter optimization is more nuanced than previously believed. While such optimization can certainly lead to large improvements in the performance of classifiers used in software analytics, it remains to be seen which specific optimizers should be applied to a new dataset.Comment: 7 pages, 2 columns, accepted for SWAN1
    • …
    corecore