1,097 research outputs found

    Statistical Analysis for Revealing Defects in Software Projects: Systematic Literature Review

    Get PDF
    Mahmoud, A. N., & Santos, V. (2021). Statistical Analysis for Revealing Defects in Software Projects: Systematic Literature Review. International Journal of Advanced Computer Science and Applications, 12(11), 237-249. https://doi.org/10.14569/IJACSA.2021.0121128Defect detection in software is the procedure to identify parts of software that may comprise defects. Software companies always seek to improve the performance of software projects in terms of quality and efficiency. They also seek to deliver the soft-ware projects without any defects to the communities and just in time. The early revelation of defects in software projects is also tried to avoid failure of those projects, save costs, team effort, and time. Therefore, these companies need to build an intelligent model capable of detecting software defects accurately and efficiently. The paper is organized as follows. Section 2 presents the materials and methods, PRISMA, search questions, and search strategy. Section 3 presents the results with an analysis, and discussion, visualizing analysis and analysis per topic. Section 4 presents the methodology. Finally, in Section 5, the conclusion is discussed. The search string was applied to all electronic repositories looking for papers published between 2015 and 2021, which resulted in 627 publications. The results focused on finding three important points by linking the results of manuscript analysis and linking them to the results of the bibliometric analysis. First, the results showed that the number of defects and the number of lines of code are among the most important factors used in revealing software defects. Second, neural networks and regression analysis are among the most important smart and statistical methods used for this purpose. Finally, the accuracy metric and the error rate are among the most important metrics used in comparisons between the efficiency of statistical and intelligent models.publishersversionpublishe

    Statistical Analysis for Revealing Defects in Software Projects

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementDefect detection in software is the procedure to identify parts of software that may comprise defects. Software companies always seek to improve the performance of software projects in terms of quality and efficiency. They also seek to deliver the soft-ware projects without any defects to the communities and just in time. The early revelation of defects in software projects is also tried to avoid failure of those projects, save costs, team effort, and time. Therefore, these companies need to build an intelligent model capable of detecting software defects accurately and efficiently. This study seeks to achieve two main objectives. The first goal is to build a statistical model to identify the critical defect factors that influence software projects. The second objective is to build a statistical model to reveal defects early in software pro-jects as reasonable accurately. A bibliometric map (VOSviewer) was used to find the relationships between the common terms in those domains. The results of this study are divided into three parts: In the first part The term "software engineering" is connected to "cluster," "regression," and "neural network." Moreover, the terms "random forest" and "feature selection" are connected to "neural network," "recall," and "software engineering," "cluster," "regression," and "fault prediction model" and "software defect prediction" and "defect density." In the second part We have checked and analyzed 29 manuscripts in detail, summarized their major contributions, and identified a few research gaps. In the third part Finally, software companies try to find the critical factors that affect the detection of software defects and find any of the intelligent or statistical methods that help to build a model capable of detecting those defects with high accuracy. Two statistical models (Multiple linear regression (MLR) and logistic regression (LR)) were used to find the critical factors and through them to detect software defects accurately. MLR is executed by using two methods which are critical defect factors (CDF) and premier list of software defect factors (PLSDF). The accuracy of MLR-CDF and MLR-PLSDF is 82.3 and 79.9 respectively. The standard error of MLR-CDF and MLR-PLSDF is 26% and 28% respectively. In addition, LR is executed by using two methods which are CDF and PLSDF. The accuracy of LR-CDF and LR-PLSDF is 86.4 and 83.8 respectively. The standard error of LR-CDF and LR-PLSDF is 22% and 25% respectively. Therefore, LRCDF outperforms on all the proposed models and state-of-the-art methods in terms of accuracy and standard error

    Exploring the Influence of Identifier Names on Code Quality: An empirical study

    Get PDF
    Given the importance of identifier names and the value of naming conventions to program comprehension, we speculated in previous work whether a connection exists between the quality of identifier names and software quality. We found that flawed identifiers in Java classes were associated with source code found to be of low quality by static analysis. This paper extends that work in three directions. First, we show that the association also holds at the finer granularity level of Java methods. This in turn makes it possible to, secondly, apply existing method-level quality and readability metrics, and see that flawed identifiers still impact on this richer notion of code quality and comprehension. Third, we check whether the association can be used in a practical way. We adopt techniques used to evaluate medical diagnostic tests in order to identify which particular identifier naming flaws could be used as a light-weight diagnostic of potentially problematic Java source code for maintenance

    Characterizing and Predicting Blocking Bugs in Open Source Projects

    Get PDF
    Software engineering researchers have studied specific types of issues such reopened bugs, performance bugs, dormant bugs, etc. However, one special type of severe bugs is blocking bugs. Blocking bugs are software bugs that prevent other bugs from being fixed. These bugs may increase maintenance costs, reduce overall quality and delay the release of the software systems. In this paper, we study blocking bugs in eight open source projects and propose a model to predict them early on. We extract 14 different factors (from the bug repositories) that are made available within 24 hours after the initial submission of the bug reports. Then, we build decision trees to predict whether a bug will be a blocking bugs or not. Our results show that our prediction models achieve F-measures of 21%-54%, which is a two-fold improvement over the baseline predictors. We also analyze the fixes of these blocking bugs to understand their negative impact. We find that fixing blocking bugs requires more lines of code to be touched compared to non-blocking bugs. In addition, our file-level analysis shows that files affected by blocking bugs are more negatively impacted in terms of cohesion, coupling complexity and size than files affected by non-blocking bugs

    Understanding the Impact of Diversity in Software Bugs on Bug Prediction Models

    Get PDF
    Nowadays, software systems are essential for businesses, users and society. At the same time such systems are growing both in complexity and size. In this context, developing high-quality software is a challenging and expensive activity for the software industry. Since software organizations are always limited by their budget, personnel and time, it is not a trivial task to allocate testing and code-review resources to areas that require the most attention. To overcome the above problem, researchers have developed software bug prediction models that can help practitioners to predict the most bug-prone software entities. Although, software bug prediction is a very popular research area, yet its industrial adoption remains limited. In this thesis, we investigate three possible issues with the current state-of-the-art in software bug prediction that affect the practical usability of prediction models. First, we argue that current bug prediction models implicitly assume that all bugs are the same without taking into consideration their impact. We study the impact of bugs in terms of experience of the developers required to fix them. Second, only few studies investigate the impact of specific type of bugs. Therefore, we characterize a severe type of bug called Blocking bugs, and provide approaches to predict them early on. Third, false-negative files are buggy files that bug prediction models incorrectly as non-buggy files. We argue that a large number of false-negative files makes bug prediction models less attractive for developers. In our thesis, we quantify the extent of false-negative files, and manually inspect them in order to better understand their nature

    Leveraging Identifier Naming Structures in Source Code and Bug Reports to Localize Relevant Bugs

    Get PDF
    When bugs are found in source code, bug reports are created which contain relevant information for developers to locate and fix the bug. In large source code repositories, it can be difficult and time consuming for developers to manually analyze bug reports to locate a bug. The discovery of patterns between bug reports and source files has led to the creation of automated tools using various techniques. Automated bug localization techniques can reduce the amount of manual effort required by developers by ranking the most probable location of the bug using textual information from bug reports and source code. Although these approaches offer some assistance, the lexical mismatch between the bug reports and the source code makes it difficult to accurately locate the buggy source code file(s) using Information Retrieval (IR) techniques. Our research proposes a technique that takes advantage of the lexical and structural patterns observed in source code identifier names to help offset the mismatch between bug reports and their related source code files. Our observations reveal that there are lexical and structural identifier naming trends for different identifier types in the source code. Using two open-source projects, and collecting frequencies for observed identifier patterns across the project, we applied the observed frequencies to matched word occurrences in bug reports across our evaluation data set to modify the significance of that word. Based on observations discovered in our empirical analysis of open source repositories ElasticSearch and RxJava, we developed a method to modify the significance of a word by altering the weight of the matched word represented in the Term Frequency - Inverse Document Frequency (TF-IDF) vectorization of that particular bug report. The idea behind this approach is that if we come across a word perceived to be significant based on our observed identifier pattern frequency data, we can apply a weight to that word in the bug report vectorization to increase the cosine similarity score between the bug report and source file vectors. This work expands and improves upon previous work by Gharibi et al. [1], who propose a multicomponent approach that uses token matching, stack trace, semantic similarity, and a revised vector space model (rVSM). Specifically, our approach modifies the rVSM component, and our work is evaluated on the same three open-source software projects: AspectJ, SWT, and ZXing. The results of our approach are comparable to the results of Gharibi et al., and we achieve an improvement in some cases. It was observed that our work outperforms many existing bug localization approaches. Top@N, Mean Reciprocal Rank (MRR), and Mean Average Precision (MAP) are metrics used to evaluate and rank our work against other approaches, revealing some improvement in bug localization across three open-source projects

    Evaluating defect prediction approaches: a benchmark and an extensive comparison

    Get PDF
    Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavo

    Experience: Quality benchmarking of datasets used in software effort estimation

    Get PDF
    Data is a cornerstone of empirical software engineering (ESE) research and practice. Data underpin numerous process and project management activities, including the estimation of development effort and the prediction of the likely location and severity of defects in code. Serious questions have been raised, however, over the quality of the data used in ESE. Data quality problems caused by noise, outliers, and incompleteness have been noted as being especially prevalent. Other quality issues, although also potentially important, have received less attention. In this study, we assess the quality of 13 datasets that have been used extensively in research on software effort estimation. The quality issues considered in this article draw on a taxonomy that we published previously based on a systematic mapping of data quality issues in ESE. Our contributions are as follows: (1) an evaluation of the “fitness for purpose” of these commonly used datasets and (2) an assessment of the utility of the taxonomy in terms of dataset benchmarking. We also propose a template that could be used to both improve the ESE data collection/submission process and to evaluate other such datasets, contributing to enhanced awareness of data quality issues in the ESE community and, in time, the availability and use of higher-quality datasets

    MiSFIT: Mining Software Fault Information and Types

    Get PDF
    As software becomes more important to society, the number, age, and complexity of systems grow. Software organizations require continuous process improvement to maintain the reliability, security, and quality of these software systems. Software organizations can utilize data from manual fault classification to meet their process improvement needs, but organizations lack the expertise or resources to implement them correctly. This dissertation addresses the need for the automation of software fault classification. Validation results show that automated fault classification, as implemented in the MiSFIT tool, can group faults of similar nature. The resulting classifications result in good agreement for common software faults with no manual effort. To evaluate the method and tool, I develop and apply an extended change taxonomy to classify the source code changes that repaired software faults from an open source project. MiSFIT clusters the faults based on the changes. I manually inspect a random sample of faults from each cluster to validate the results. The automatically classified faults are used to analyze the evolution of a software application over seven major releases. The contributions of this dissertation are an extended change taxonomy for software fault analysis, a method to cluster faults by the syntax of the repair, empirical evidence that fault distribution varies according to the purpose of the module, and the identification of project-specific trends from the analysis of the changes

    Software Testing of DSpace Digital Library Software

    Get PDF
    In todayâ s ever-changing world of technology and information, a growing number of organizations and universities seek to store digital documents in an online, easily accessible manner. This is possible by establishing and maintaining a â Digital Libraryâ in the institution. These â Digital Libraryâ Repositories can be powerful systems that allow institutions to store and maintain their digital documents and allow interaction and collaboration among users in the organizations
    corecore