31,613 research outputs found
Is One Hyperparameter Optimizer Enough?
Hyperparameter tuning is the black art of automatically finding a good
combination of control parameters for a data miner. While widely applied in
empirical Software Engineering, there has not been much discussion on which
hyperparameter tuner is best for software analytics. To address this gap in the
literature, this paper applied a range of hyperparameter optimizers (grid
search, random search, differential evolution, and Bayesian optimization) to
defect prediction problem. Surprisingly, no hyperparameter optimizer was
observed to be `best' and, for one of the two evaluation measures studied here
(F-measure), hyperparameter optimization, in 50\% cases, was no better than
using default configurations.
We conclude that hyperparameter optimization is more nuanced than previously
believed. While such optimization can certainly lead to large improvements in
the performance of classifiers used in software analytics, it remains to be
seen which specific optimizers should be applied to a new dataset.Comment: 7 pages, 2 columns, accepted for SWAN1
Learning Effective Changes for Software Projects
The primary motivation of much of software analytics is decision making. How
to make these decisions? Should one make decisions based on lessons that arise
from within a particular project? Or should one generate these decisions from
across multiple projects? This work is an attempt to answer these questions.
Our work was motivated by a realization that much of the current generation
software analytics tools focus primarily on prediction. Indeed prediction is a
useful task, but it is usually followed by "planning" about what actions need
to be taken. This research seeks to address the planning task by seeking
methods that support actionable analytics that offer clear guidance on what to
do. Specifically, we propose XTREE and BELLTREE algorithms for generating a set
of actionable plans within and across projects. Each of these plans, if
followed will improve the quality of the software project.Comment: 4 pages, 2 figures. This a submission for ASE 2017 Doctoral Symposiu
Are Delayed Issues Harder to Resolve? Revisiting Cost-to-Fix of Defects throughout the Lifecycle
Many practitioners and academics believe in a delayed issue effect (DIE);
i.e. the longer an issue lingers in the system, the more effort it requires to
resolve. This belief is often used to justify major investments in new
development processes that promise to retire more issues sooner.
This paper tests for the delayed issue effect in 171 software projects
conducted around the world in the period from 2006--2014. To the best of our
knowledge, this is the largest study yet published on this effect. We found no
evidence for the delayed issue effect; i.e. the effort to resolve issues in a
later phase was not consistently or substantially greater than when issues were
resolved soon after their introduction.
This paper documents the above study and explores reasons for this mismatch
between this common rule of thumb and empirical data. In summary, DIE is not
some constant across all projects. Rather, DIE might be an historical relic
that occurs intermittently only in certain kinds of projects. This is a
significant result since it predicts that new development processes that
promise to faster retire more issues will not have a guaranteed return on
investment (depending on the context where applied), and that a long-held truth
in software engineering should not be considered a global truism.Comment: 31 pages. Accepted with minor revisions to Journal of Empirical
Software Engineering. Keywords: software economics, phase delay, cost to fi
What is the Connection Between Issues, Bugs, and Enhancements? (Lessons Learned from 800+ Software Projects)
Agile teams juggle multiple tasks so professionals are often assigned to
multiple projects, especially in service organizations that monitor and
maintain a large suite of software for a large user base. If we could predict
changes in project conditions changes, then managers could better adjust the
staff allocated to those projects.This paper builds such a predictor using data
from 832 open source and proprietary applications. Using a time series analysis
of the last 4 months of issues, we can forecast how many bug reports and
enhancement requests will be generated next month. The forecasts made in this
way only require a frequency count of this issue reports (and do not require an
historical record of bugs found in the project). That is, this kind of
predictive model is very easy to deploy within a project. We hence strongly
recommend this method for forecasting future issues, enhancements, and bugs in
a project.Comment: Accepted to 2018 International Conference on Software Engineering, at
the software engineering in practice track. 10 pages, 10 figure
Towards Automated Performance Bug Identification in Python
Context: Software performance is a critical non-functional requirement,
appearing in many fields such as mission critical applications, financial, and
real time systems. In this work we focused on early detection of performance
bugs; our software under study was a real time system used in the
advertisement/marketing domain.
Goal: Find a simple and easy to implement solution, predicting performance
bugs.
Method: We built several models using four machine learning methods, commonly
used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian
Networks, and Logistic Regression.
Results: Our empirical results show that a C4.5 model, using lines of code
changed, file's age and size as explanatory variables, can be used to predict
performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that
reducing the number of changes delivered on a commit, can decrease the chance
of performance bug injection.
Conclusions: We believe that our approach can help practitioners to eliminate
performance bugs early in the development cycle. Our results are also of
interest to theoreticians, establishing a link between functional bugs and
(non-functional) performance bugs, and explicitly showing that attributes used
for prediction of functional bugs can be used for prediction of performance
bugs
- âŠ