Search CORE

12 research outputs found

Does UML modeling associate with lower defect proneness?:a preliminary empirical investigation

Author: Chaudron Michel R.V.
Ho-Quang Truong
Raghuraman Adithya
Serebrenik Alexander
Vasilescu Bogdan
Publication venue: IEEE Computer Society
Publication date: 01/05/2019
Field of study

The benefits of modeling the design to improve the quality and maintainability of software systems have long been advocated and recognized. Yet, the empirical evidence on this remains scarce. In this paper, we fill this gap by reporting on an empirical study of the relationship between UML modeling and software defect proneness in a large sample of open-source GitHub projects. Using statistical modeling, and controlling for confounding variables, we show that projects containing traces of UML models in their repositories experience, on average, a statistically minorly different number of software defects (as mined from their issue trackers) than projects without traces of UML models.</p

Pure OAI Repository

Snoring: A Noise Defect Prediction Datasets

Author: Ahluwalia Aalok
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2019
Field of study

Defect prediction aims at identifying software artifacts that are likely to exhibit a defect. The main purpose of defect prediction is to reduce the cost of testing and code review, by letting developers focus on speciﬁc artifacts. Several researchers have worked on improving the accuracy of defect estimation models using techniques such as tuning, re-balancing, or feature selection. Ultimately, the reliability of a prediction model depends on the quality of the dataset. Therefore eﬀort has been spent in identifying sources of noise in the datasets, and how to deal with them, including defect misclassiﬁcation and defect origin. A key component of defect prediction approaches is the attribution of a defect to a projects release. Although developers might be able to attribute a defect to a speciﬁc release, in most cases a defect is attributed to the release after which the defect has been discovered. However, in many circumstances, it can happen that a defect is only discovered several releases after its introduction. This might introduce a bias in the dataset, i.e., treating the intermediate releases as defect-free and the latter as defect-prone. We call this phenomenon a “sleeping defect”. We call “snoring” the phenomenon in which classes are aﬀected by sleeping defects only, that would be treated as defect-free until the defect is discovered. In this work, we analyze, on data from more than 4,000 bugs and 600 releases of 20 open source projects from the Apache ecosystem for investigating: 1)the magnitude of the sleeping defects, 2) the magnitude of the snoring classes, 3)if snoring impacts the evaluation of classiﬁers, 4)if snoring impacts classiﬁer accuracy, and 5)if removing the last releases of data is beneﬁcial in reducing the negative impact of the snoring noise on classiﬁers accuracy. Our results show that, on average across projects: 1)most of the defects in a project slept for more than 19% of the existing releases, 2)the missing rate is more than 50% unless we remove more than 20% of the releases, 3) the relative error in measuring the classiﬁer accuracy achieved by using a dataset with snoring is about 100% in all accuracy metrics other than AUC, 4) the presence of snoring decreases the accuracy in each of the 15 classiﬁers, in each of the 6 accuracy metrics. For instance, Recall, F1, Kappa and Matthews decreases by about 80%, and 5) removing one release of data is better than removing no data in all accuracy metrics. For instance, Recall, F1, Kappa and Matthews increase by about 30%

DigitalCommons@CalPoly

What if a bug has a different origin? making sense of bugs without an explicit bug introducing change

Author: González-Barahona JM
Robles G
Rodriguez Perez G
Serebrenik A Alexander
Zaidman Andy
Publication venue: Association for Computing Machinery, Inc
Publication date: 01/01/2018
Field of study

Background: Many studies in the software research literature on bug fixing are built upon the assumption that a given bug was introduced by the lines of code that were modified to fix it , or variations of it. Although this assumption seems very reasonable at first glance, there is little empirical evidence supporting it. A careful examination surfaces that there are other possible sources for the introduction of bugs such as modifications to those lines that happened before the last change an changes external to the piece of code being fixed. Goal: We aim at understanding the complex phenomenon of bug introduction and bug fix. Method: We design a preliminary approach distinguishing between bug introducing commits (BIC) and first failing moments (FFM). We apply this approach to Nova and ElasticSearch, two large and well-known open source software projects. Results: In our initial results we obtain that at least 24% bug fixes in Nova and 10% in ElasticSearch have not been caused by a BIC but by co-evolution, compatibility issues or bugs in external API. Merely 26--29% of BICs can be found using the algorithm based on the assumption that a given bug was introduced by the lines of code that were modified to fix it . Conclusions: The approach allows also for a better framing of the comparison of automatic methods to find bug inducting changes. Our results indicate that more attention should be paid to whether a bug has been introduced and, when it was introduced

Repository TU/e

Crossref

Pure OAI Repository

What if a bug has a different origin? making sense of bugs without an explicit bug introducing change

Author: González-Barahona Jesús M.
Robles Gregorio
Rodriguez Perez Gema
Serebrenik A.
Zaidman Andy
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/10/2018
Field of study

Background: Many studies in the software research literature on bug fixing are built upon the assumption that "a given bug was introduced by the lines of code that were modified to fix it", or variations of it. Although this assumption seems very reasonable at first glance, there is little empirical evidence supporting it. A careful examination surfaces that there are other possible sources for the introduction of bugs such as modifications to those lines that happened before the last change an changes external to the piece of code being fixed. Goal: We aim at understanding the complex phenomenon of bug introduction and bug fix. Method: We design a preliminary approach distinguishing between bug introducing commits (BIC) and first failing moments (FFM). We apply this approach to Nova and ElasticSearch, two large and well-known open source software projects. Results: In our initial results we obtain that at least 24% bug fixes in Nova and 10% in ElasticSearch have not been caused by a BIC but by co-evolution, compatibility issues or bugs in external API. Merely 26--29% of BICs can be found using the algorithm based on the assumption that "a given bug was introduced by the lines of code that were modified to fix it". Conclusions: The approach allows also for a better framing of the comparison of automatic methods to find bug inducting changes. Our results indicate that more attention should be paid to whether a bug has been introduced and, when it was introduced

What if a bug has a Different Origin?: Making Sense of Bugs Without an Explicit Bug Introducing Change

Author: Gonzalez-Barahona Jesus M. (author)
Robles Gregorio (author)
Rodriguez Perez G. (author)
Serebrenik Alexander (author)
Zaidman A.E. (author)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

TU Delft Repository

What if a bug has a Different Origin?: Making Sense of Bugs Without an Explicit Bug Introducing Change

Author: Gonzalez-Barahona Jesus M.
Mendez Fernandez Daniel
Mockus Audris
Oivo Markku
Robles Gregorio
Rodriguez Perez G.
Serebrenik Alexander
Zaidman A.E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Use and misuse of the term "Experiment" in mining software repositories research

Author: Ayala Martínez Claudia Patricia
Franch Gutiérrez Javier
Juristo Juzgado Natalia
Turhan Burak
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2022
Field of study

The significant momentum and importance of Mining Software Repositories (MSR) in Software Engineering (SE) has fostered new opportunities and challenges for extensive empirical research. However, MSR researchers seem to struggle to characterize the empirical methods they use into the existing empirical SE body of knowledge. This is especially the case of MSR experiments. To provide evidence on the special characteristics of MSR experiments and their differences with experiments traditionally acknowledged in SE so far, we elicited the hallmarks that differentiate an experiment from other types of empirical studies and characterized the hallmarks and types of experiments in MSR. We analyzed MSR literature obtained from a small-scale systematic mapping study to assess the use of the term experiment in MSR. We found that 19% of the papers claiming to be an experiment are indeed not an experiment at all but also observational studies, so they use the term in a misleading way. From the remaining 81% of the papers, only one of them refers to a genuine controlled experiment while the others stand for experiments with limited control. MSR researchers tend to overlook such limitations, compromising the interpretation of the results of their studies. We provide recommendations and insights to support the improvement of MSR experiments.This work has been partially supported by the Spanish project: MCI PID2020-117191RB-I00.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Evaluating the Effectiveness of Code2Vec for Bug Prediction When Considering That Not All Bugs Are the Same

Author: Baron Kilby
Publication venue: 'University of Waterloo'
Publication date: 14/09/2020
Field of study

Bug prediction is an area of research focused on predicting where in a software project future bugs will occur. The purpose of bug prediction models is to help companies spend their quality assurance resources more efficiently by prioritizing the testing of the most defect prone entities. Most bug prediction models are only concerned with predicting whether an entity has a bug, or how many bugs an entity will have, which implies that all bugs have the same importance. In reality, bugs can have vastly different origins, impacts, priorities, and costs; therefore, bug prediction models could potentially be improved if they were able to give an indication of which bugs to prioritize based on an organization’s needs. This paper evaluates a possible method for predicting bug attributes related to cost by analyzing over 33,000 bugs from 11 different projects. If bug attributes related to cost can be predicted, then bug prediction models can use the approach to improve the granularity of their results. The cost metrics in this study are bug priority, the experience of the developer who fixed the bug, and the size of the bug fix. First, it is shown that bugs differ along each cost metric, and prioritizing buggy entities along each of these metrics will produce very different results. We then evaluate two methods of predicting cost metrics: traditional deep learning models, and semantic learning models. The results of the analysis found evidence that traditional independent variables show potential as predictors of cost metrics. The semantic learning model was not as successful, but may show more effectiveness in future iterations

University of Waterloo's Institutional Repository