31 research outputs found
Is One Hyperparameter Optimizer Enough?
Hyperparameter tuning is the black art of automatically finding a good
combination of control parameters for a data miner. While widely applied in
empirical Software Engineering, there has not been much discussion on which
hyperparameter tuner is best for software analytics. To address this gap in the
literature, this paper applied a range of hyperparameter optimizers (grid
search, random search, differential evolution, and Bayesian optimization) to
defect prediction problem. Surprisingly, no hyperparameter optimizer was
observed to be `best' and, for one of the two evaluation measures studied here
(F-measure), hyperparameter optimization, in 50\% cases, was no better than
using default configurations.
We conclude that hyperparameter optimization is more nuanced than previously
believed. While such optimization can certainly lead to large improvements in
the performance of classifiers used in software analytics, it remains to be
seen which specific optimizers should be applied to a new dataset.Comment: 7 pages, 2 columns, accepted for SWAN1
A Hybrid Multi-Filter Wrapper Feature Selection Method for Software Defect Predictors
Software Defect Prediction (SDP) is an approach used for identifying defect-prone software modules or components. It helps software engineer to optimally, allocate limited resources to defective software modules or components in the testing or maintenance phases of software development life cycle (SDLC). Nonetheless, the predictive performance of SDP models reckons largely on the quality of dataset utilized for training the predictive models. The high dimensionality of software metric features has been noted as a data quality problem which negatively affects the predictive performance of SDP models. Feature Selection (FS) is a well-known method for solving high dimensionality problem and can be divided into filter-based and wrapper-based methods. Filter-based FS has low computational cost, but the predictive performance of its classification algorithm on the filtered data cannot be guaranteed. On the contrary, wrapper-based FS have good predictive performance but with high computational cost and lack of generalizability. Therefore, this study proposes a hybrid multi-filter wrapper method for feature selection of relevant and irredundant features in software defect prediction. The proposed hybrid feature selection will be developed to take advantage of filter-filter and filter-wrapper relationships to give optimal feature subsets, reduce its evaluation cycle and subsequently improve SDP models overall predictive performance in terms of Accuracy, Precision and Recall values
Connecting Software Metrics across Versions to Predict Defects
Accurate software defect prediction could help software practitioners
allocate test resources to defect-prone modules effectively and efficiently. In
the last decades, much effort has been devoted to build accurate defect
prediction models, including developing quality defect predictors and modeling
techniques. However, current widely used defect predictors such as code metrics
and process metrics could not well describe how software modules change over
the project evolution, which we believe is important for defect prediction. In
order to deal with this problem, in this paper, we propose to use the
Historical Version Sequence of Metrics (HVSM) in continuous software versions
as defect predictors. Furthermore, we leverage Recurrent Neural Network (RNN),
a popular modeling technique, to take HVSM as the input to build software
prediction models. The experimental results show that, in most cases, the
proposed HVSM-based RNN model has a significantly better effort-aware ranking
effectiveness than the commonly used baseline models
Are Delayed Issues Harder to Resolve? Revisiting Cost-to-Fix of Defects throughout the Lifecycle
Many practitioners and academics believe in a delayed issue effect (DIE);
i.e. the longer an issue lingers in the system, the more effort it requires to
resolve. This belief is often used to justify major investments in new
development processes that promise to retire more issues sooner.
This paper tests for the delayed issue effect in 171 software projects
conducted around the world in the period from 2006--2014. To the best of our
knowledge, this is the largest study yet published on this effect. We found no
evidence for the delayed issue effect; i.e. the effort to resolve issues in a
later phase was not consistently or substantially greater than when issues were
resolved soon after their introduction.
This paper documents the above study and explores reasons for this mismatch
between this common rule of thumb and empirical data. In summary, DIE is not
some constant across all projects. Rather, DIE might be an historical relic
that occurs intermittently only in certain kinds of projects. This is a
significant result since it predicts that new development processes that
promise to faster retire more issues will not have a guaranteed return on
investment (depending on the context where applied), and that a long-held truth
in software engineering should not be considered a global truism.Comment: 31 pages. Accepted with minor revisions to Journal of Empirical
Software Engineering. Keywords: software economics, phase delay, cost to fi