5,894 research outputs found
MEG: Multi-objective Ensemble Generation for Software Defect Prediction
Background: Defect Prediction research aims at assisting software
engineers in the early identification of software defect during the
development process. A variety of automated approaches, ranging from traditional classification models to more sophisticated
learning approaches, have been explored to this end. Among these,
recent studies have proposed the use of ensemble prediction models
(i.e., aggregation of multiple base classifiers) to build more robust
defect prediction models. /
Aims: In this paper, we introduce a novel
approach based on multi-objective evolutionary search to automatically generate defect prediction ensembles. Our proposal is not
only novel with respect to the more general area of evolutionary
generation of ensembles, but it also advances the state-of-the-art
in the use of ensemble in defect prediction. /
Method: We assess
the effectiveness of our approach, dubbed as Multi-objective
Ensemble Generation (MEG), by empirically benchmarking it
with respect to the most related proposals we found in the literature
on defect prediction ensembles and on multi-objective evolutionary
ensembles (which, to the best of our knowledge, had never been
previously applied to tackle defect prediction). /
Result: Our results
show that MEG is able to generate ensembles which produce similar
or more accurate predictions than those achieved by all the other
approaches considered in 73% of the cases (with favourable large
effect sizes in 80% of them). /
Conclusions: MEG is not only able
to generate ensembles that yield more accurate defect predictions
with respect to the benchmarks considered, but it also does it automatically, thus relieving the engineers from the burden of manual
design and experimentation
A COUPLING AND COHESION METRICS SUITE FOR
The increasing need for software quality measurements has led to extensive research
into software metrics and the development of software metric tools. To maintain high
quality software, developers need to strive for a low-coupled and highly cohesive
design. One of many properties considered when measuring coupling and cohesion is the
type of relationships that made up coupling and cohesion. What these specific
relationships are is widely understood and accepted by researchers and practitioners.
However, different researchers base their metrics on a different subset of these
relationships.
Studies have shown that because of the inclusion of multiple subsets of relationships
in one measure of coupling and cohesion metrics, the measures tend to correlate among
each other. Validation of these metrics against maintainability index of a Java program
suggested that there is high multicollinearity among coupling and cohesion metrics.
This research introduces an approach of implementing coupling and cohesion
metrics. Every possible relationship is considered and, for each, addressed the issue of
whether or not it has significant effect on maintainability index prediction. Validation of
orthogonality of the selected metrics is assessed by means of principal component
analysis. The investigation suggested that some of the metrics are independent set of
metrics, while some are measuring similar dimension
The impact of using biased performance metrics on software defect prediction research
Context: Software engineering researchers have undertaken many experiments
investigating the potential of software defect prediction algorithms.
Unfortunately, some widely used performance metrics are known to be
problematic, most notably F1, but nevertheless F1 is widely used.
Objective: To investigate the potential impact of using F1 on the validity of
this large body of research.
Method: We undertook a systematic review to locate relevant experiments and
then extract all pairwise comparisons of defect prediction performance using F1
and the un-biased Matthews correlation coefficient (MCC).
Results: We found a total of 38 primary studies. These contain 12,471 pairs
of results. Of these, 21.95% changed direction when the MCC metric is used
instead of the biased F1 metric. Unfortunately, we also found evidence
suggesting that F1 remains widely used in software defect prediction research.
Conclusions: We reiterate the concerns of statisticians that the F1 is a
problematic metric outside of an information retrieval context, since we are
concerned about both classes (defect-prone and not defect-prone units). This
inappropriate usage has led to a substantial number (more than one fifth) of
erroneous (in terms of direction) results. Therefore we urge researchers to (i)
use an unbiased metric and (ii) publish detailed results including confusion
matrices such that alternative analyses become possible.Comment: Submitted to the journal Information & Software Technology. It is a
greatly extended version of "Assessing Software Defection Prediction
Performance: Why Using the Matthews Correlation Coefficient Matters"
presented at EASE 202
Is One Hyperparameter Optimizer Enough?
Hyperparameter tuning is the black art of automatically finding a good
combination of control parameters for a data miner. While widely applied in
empirical Software Engineering, there has not been much discussion on which
hyperparameter tuner is best for software analytics. To address this gap in the
literature, this paper applied a range of hyperparameter optimizers (grid
search, random search, differential evolution, and Bayesian optimization) to
defect prediction problem. Surprisingly, no hyperparameter optimizer was
observed to be `best' and, for one of the two evaluation measures studied here
(F-measure), hyperparameter optimization, in 50\% cases, was no better than
using default configurations.
We conclude that hyperparameter optimization is more nuanced than previously
believed. While such optimization can certainly lead to large improvements in
the performance of classifiers used in software analytics, it remains to be
seen which specific optimizers should be applied to a new dataset.Comment: 7 pages, 2 columns, accepted for SWAN1
- …