Search CORE

83 research outputs found

Continuous Defect Prediction: The Idea and a Related Dataset

Author: Kawalerowicz Marcin
Madeyski Lech
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/06/2017
Field of study

We would like to present the idea of our Continuous Defect Prediction (CDP) research and a related dataset that we created and share. Our dataset is currently a set of more than 11 million data rows, representing files involved in Continuous Integration (CI) builds, that synthesize the results of CI builds with data we mine from software repositories. Our dataset embraces 1265 software projects, 30,022 distinct commit authors and several software process metrics that in earlier research appeared to be useful in software defect prediction. In this particular dataset we use TravisTorrent as the source of CI data. TravisTorrent synthesizes commit level information from the Travis CI server and GitHub open-source projects repositories. We extend this data to a file change level and calculate the software process metrics that may be used, for example, as features to predict risky software changes that could break the build if committed to a repository with CI enabled.Comment: Lech Madeyski and Marcin Kawalerowicz. "Continuous Defect Prediction: The Idea and a Related Dataset" In: 14th International Conference on Mining Software Repositories (MSR'17). Buenos Aires. 2017, pp. 515-518. doi: 10.1109/MSR.2017.46. URL: http://madeyski.e-informatyka.pl/download/MadeyskiKawalerowiczMSR17.pd

arXiv.org e-Print Archive

Crossref

SZZ Unleashed: An Open Implementation of the SZZ Algorithm -- Featuring Example Usage in a Study of Just-in-Time Bug Prediction for the Jenkins Project

Author: Berg Kristian
Borg Markus
Hansson Daniel
Svensson Oscar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Numerous empirical software engineering studies rely on detailed information about bugs. While issue trackers often contain information about when bugs were fixed, details about when they were introduced to the system are often absent. As a remedy, researchers often rely on the SZZ algorithm as a heuristic approach to identify bug-introducing software changes. Unfortunately, as reported in a recent systematic literature review, few researchers have made their SZZ implementations publicly available. Consequently, there is a risk that research effort is wasted as new projects based on SZZ output need to initially reimplement the approach. Furthermore, there is a risk that newly developed (closed source) SZZ implementations have not been properly tested, thus conducting research based on their output might introduce threats to validity. We present SZZ Unleashed, an open implementation of the SZZ algorithm for git repositories. This paper describes our implementation along with a usage example for the Jenkins project, and conclude with an illustrative study on just-in-time bug prediction. We hope to continue evolving SZZ Unleashed on GitHub, and warmly invite the community to contribute

arXiv.org e-Print Archive

Crossref

Is One Hyperparameter Optimizer Enough?

Author: Bergstra J.
Bergstra J.
Lewis C.
Menzies T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/10/2018
Field of study

Hyperparameter tuning is the black art of automatically finding a good combination of control parameters for a data miner. While widely applied in empirical Software Engineering, there has not been much discussion on which hyperparameter tuner is best for software analytics. To address this gap in the literature, this paper applied a range of hyperparameter optimizers (grid search, random search, differential evolution, and Bayesian optimization) to defect prediction problem. Surprisingly, no hyperparameter optimizer was observed to be `best' and, for one of the two evaluation measures studied here (F-measure), hyperparameter optimization, in 50\% cases, was no better than using default configurations. We conclude that hyperparameter optimization is more nuanced than previously believed. While such optimization can certainly lead to large improvements in the performance of classifiers used in software analytics, it remains to be seen which specific optimizers should be applied to a new dataset.Comment: 7 pages, 2 columns, accepted for SWAN1

arXiv.org e-Print Archive

Crossref

Is It Safe to Uplift This Patch? An Empirical Study on Mozilla Firefox

Author: An Le
Castelluccio Marco
Khomh Foutse
Publication venue
Publication date: 01/01/2017
Field of study

In rapid release development processes, patches that fix critical issues, or implement high-value features are often promoted directly from the development channel to a stabilization channel, potentially skipping one or more stabilization channels. This practice is called patch uplift. Patch uplift is risky, because patches that are rushed through the stabilization phase can end up introducing regressions in the code. This paper examines patch uplift operations at Mozilla, with the aim to identify the characteristics of uplifted patches that introduce regressions. Through statistical and manual analyses, we quantitatively and qualitatively investigate the reasons behind patch uplift decisions and the characteristics of uplifted patches that introduced regressions. Additionally, we interviewed three Mozilla release managers to understand organizational factors that affect patch uplift decisions and outcomes. Results show that most patches are uplifted because of a wrong functionality or a crash. Uplifted patches that lead to faults tend to have larger patch size, and most of the faults are due to semantic or memory errors in the patches. Also, release managers are more inclined to accept patch uplift requests that concern certain specific components, and-or that are submitted by certain specific developers.Comment: In proceedings of the 33rd International Conference on Software Maintenance and Evolution (ICSME 2017

arXiv.org e-Print Archive

Crossref

PolyPublie

Connecting Software Metrics across Versions to Predict Defects

Author: Guo Jianbo
Li Yanhui
Liu Yibin
Xu Baowen
Zhou Yuming
Publication venue
Publication date: 28/12/2017
Field of study

Accurate software defect prediction could help software practitioners allocate test resources to defect-prone modules effectively and efficiently. In the last decades, much effort has been devoted to build accurate defect prediction models, including developing quality defect predictors and modeling techniques. However, current widely used defect predictors such as code metrics and process metrics could not well describe how software modules change over the project evolution, which we believe is important for defect prediction. In order to deal with this problem, in this paper, we propose to use the Historical Version Sequence of Metrics (HVSM) in continuous software versions as defect predictors. Furthermore, we leverage Recurrent Neural Network (RNN), a popular modeling technique, to take HVSM as the input to build software prediction models. The experimental results show that, in most cases, the proposed HVSM-based RNN model has a significantly better effort-aware ranking effectiveness than the commonly used baseline models

arXiv.org e-Print Archive

Crossref

A DEEP ENSEMBLE LEARNING METHOD FOR EFFORT-AWARE JUST-IN-TIME DEFECT PREDICTION

Author: ALBAHLI Saleh
Publication venue: Lublin University of Technology
Publication date: 20/11/2019
Field of study

Nowadays, logistics for transportation and distribution of merchandise are a key element to increase the competitiveness of companies. However, the election of alternative routes outside the panned routes causes the logistic companies to provide a poor-quality service, with units that endanger the appropriate deliver of merchandise and impacting negatively the way in which the supply chain works. This paper aims to develop a module that allows the processing, analysis and deployment of satellite information oriented to the pattern analysis, to find anomalies in the paths of the operators by implementing the algorithm TODS, to be able to help in the decision making. The experimental results show that the algorithm detects optimally the abnormal routes using historical data as a base

Multidisciplinary Digital Publishing Institute

Biblioteka Nauki - repozytorium artykuÅÃ³w

Lublin University of Technology Journals