1,466 research outputs found

    A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits

    Get PDF
    Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.Comment: Status: Accepted at Empirical Software Engineerin

    Comparing text-based and dependence-based approaches for determining the origins of bugs

    Get PDF
    Identifying bug origins – the point where erroneous code was introduced – is crucial for many software engineering activities, from identifying process weaknesses to gathering data to support bug detection tools. Unfortunately, this information is not usually recorded when fixing bugs, and recovering it later is challenging. Recently, the text approach and the dependence approach have been developed to tackle this problem. Respectively, they examine textual and dependence-related changes that occurred prior to a bug fix. However, only limited evaluation has been carried out, partially because of a lack of available implementations and of datasets linking bugs to origins. To address this, origins of 174 bugs in three projects were manually identified and compared to a simulation of the approaches. Both approaches were partially successful across a variety of bugs – achieving 29–79% precision and 40–70% recall. Results suggested the precise definition of program dependence could affect performance, as could whether the approaches identified a single or multiple origins. Some potential improvements are explored in detail and identify pragmatic strategies for combining techniques along with simple modifications. Even after adopting these improvements, there remain many challenges: large commits, unrelated changes and long periods between origins and fixes all reduce effectiveness

    Studying the Use of SZZ with Non-functional bugs

    Get PDF
    Non-functional bugs bear a heavy cost on both software developers and end-users. Tools to reduce the occurrence, impact, and repair time of non-functional bugs can therefore provide key assistance for software developers racing to fix these issues. Classification models that focus on identifying defect-prone commits, referred to as \emph{Just-In-Time (JIT) Quality Assurance} are known to be useful in allowing developers to review risky commits. JIT models, however, leverage the SZZ approach to identify whether or not a past change is bug-inducing. However, the due to the nature of non-functional bugs, their fixes may be scattered and separate from their bug-inducing locations in the source code. Yet, prior studies that leverage or evaluate the SZZ approach do not consider non-functional bugs, leading to potential bias on the results. In this thesis, we conduct an empirical study on the results of the SZZ approach on the non-functional bugs in the NFBugs dataset, and the performance bugs in Cassandra, and Hadoop. We manually examine whether each identified bug-inducing change is indeed the correct bug-inducing change. Our manual study shows that a large portion of non-functional bugs cannot be properly identified by the SZZ approach. We uncover root causes for false detection that have not been previously found. We evaluate the identified bug-inducing changes based on criteria from prior research. Our results may be used to assist in future research on non-functional bugs, and highlight the need to complement SZZ to accommodate the unique characteristics of non-functional bugs. Furthermore, we conduct an empirical study to evaluate model performance for JIT models by using them to identify bug-inducing code commits for performance related bugs. Our findings show that JIT defect prediction classifies non-performance bug-inducing commits better than performance bug-inducing commits. However, we find that manually correcting errors in the training data only slightly improves the models. In the absence of a large number of correctly labelled performance bug-inducing commits, our findings show that combining all available training data yields the best classification results

    Changeset-based Retrieval of Source Code Artifacts for Bug Localization

    Get PDF
    Modern software development is extremely collaborative and agile, with unprecedented speed and scale of activity. Popular trends like continuous delivery and continuous deployment aim at building, fixing, and releasing software with greater speed and frequency. Bug localization, which aims to automatically localize bug reports to relevant software artifacts, has the potential to improve software developer efficiency by reducing the time spent on debugging and examining code. To date, this problem has been primarily addressed by applying information retrieval techniques based on static code elements, which are intrinsically unable to reflect how software evolves over time. Furthermore, as prior approaches frequently rely on exact term matching to measure relatedness between a bug report and a software artifact, they are prone to be affected by the lexical gap that exists between natural and programming language. This thesis explores using software changes (i.e., changesets), instead of static code elements, as the primary data unit to construct an information retrieval model toward bug localization. Changesets, which represent the differences between two consecutive versions of the source code, provide a natural representation of a software change, and allow to capture both the semantics of the source code, and the semantics of the code modification. To bridge the lexical gap between source code and natural language, this thesis investigates using topic modeling and deep learning architectures that enable creating semantically rich data representation with the goal of identifying latent connection between bug reports and source code. To show the feasibility of the proposed approaches, this thesis also investigates practical aspects related to using a bug localization tool, such retrieval delay and training data availability. The results indicate that the proposed techniques effectively leverage historical data about bugs and their related source code components to improve retrieval accuracy, especially for bug reports that are expressed in natural language, with little to no explicit code references. Further improvement in accuracy is observed when the size of the training dataset is increased through data augmentation and data balancing strategies proposed in this thesis, although depending on the model architecture the magnitude of the improvement varies. In terms of retrieval delay, the results indicate that the proposed deep learning architecture significantly outperforms prior work, and scales up with respect to search space size

    Improving Software Quality by Synergizing Effective Code Inspection and Regression Testing

    Get PDF
    Software quality assurance is an essential practice in software development and maintenance. Evolving software systems consistently and safely is challenging. All changes to a system must be comprehensively tested and inspected to gain confidence that the modified system behaves as intended. To detect software defects, developers often conduct quality assurance activities, such as regression testing and code review, after implementing or changing required functionalities. They commonly evaluate a program based on two complementary techniques: dynamic program analysis and static program analysis. Using an automated testing framework, developers typically discover program faults by observing program execution with test cases that encode required program behavior as well as represent defects. Unlike dynamic analysis, developers make sure of the program correctness without executing a program by static analysis. They understand source code through manual inspection or identify potential program faults with an automated tool for statically analyzing a program. By removing the boundaries between static and dynamic analysis, complementary strengths and weaknesses of both techniques can create unified analyses. For example, dynamic analysis is efficient and precise but it requires selection of test cases without guarantee that the test cases cover all possible program executions, and static analysis is conservative and sound but it produces less precise results due to its approximation of all possible behaviors that may perform at run time. Many dynamic and static techniques have been proposed, but testing a program involves substantial cost and risks and inspecting code change is tedious and error-prone. Our research addresses two fundamental problems in dynamic and static techniques. (1) To evaluate a program, developers are typically required to implement test cases and reuse them. As they develop more test cases for verifying new implementations, the execution cost of test cases increases accordingly. After every modification, they periodically conduct regression test to see whether the program executes without introducing new faults in the presence of program evolution. To reduce the time required to perform regression testing, developers should select an appropriate subset of the test suite with a guarantee of revealing faults as running entire test cases. Such regression testing selection techniques are still challenging as these methods also have substantial costs and risks and discard test cases that could detect faults. (2) As a less formal and more lightweight method than running a test suite, developers often conduct code reviews based on tool support; however, understanding context and changes is the key challenge of code reviews. While reviewing code changes—addressing one single issue—might not be difficult, it is extremely difficult to understand complex changes—including multiple issues such as bug fixes, refactorings, and new feature additions. Developers need to understand intermingled changes addressing multiple development issues, finding which region of the code changes deals with a particular issue. Although such changes do not cause trouble in implementation, investigating these changes becomes time-consuming and error-prone since the intertwined changes are loosely related, leading to difficulty in code reviews. To address the limitations outlined above, our research makes the following contributions. First, we present a model-based approach to efficiently build a regression test suite that facilitates Extended Finite State Machines (EFSMs). Changes to the system are performed at transition level by adding, deleting or replacing transition. Tests are a sequence of input and expected output messages with concrete parameter values over the supported data types. Fully-observable tests are introduced whose descriptions contain all the information about the transitions executed by the tests. An invariant characterizing fully observable tests is formulated such that a test is fully-observable whenever the invariant is a satisfiable formula. Incremental procedures are developed to efficiently evaluate the invariant and to select tests from a test suite that are guaranteed to exercise a given change when the tests run on a modified EFSM. Tests rendered unusable due to a change are also identified. Overlaps among the test descriptions are exploited to extend the approach to simultaneously select and discard multiple tests to alleviate the test selection costs. Although test regression selection problem is NP-hard [78], the experimental results show the cost of our test selection procedure is still acceptable and economical. Second, to support code review and regression testing, we present a technique, called ChgCutter. It helps developers understand and validate composite changes as follows. It interactively decomposes these complex, composite changes into atomic changes, builds related change subsets using program dependence relationships without syntactic violation, and safely selects only related test cases from the test suite to reduce the time to conduct regression testing. When a code reviewer selects a change region from both original and changed versions of a program, ChgCutter automatically identifies similar change regions based on the dependence analysis and the tree-based code search technique. By automatically applying a change to the identified regions in an original program version, ChgCutter generates a program version which is a syntactically correct version of program. Given a generated program version, it leverages a testing selection technique to select and run a subset of the test suite affected by a change automatically separated from mixed changes. Based on the iterative change selection process, there can be each different program version that include its separated change. Therefore, ChgCutter helps code reviewers inspect large, complex changes by effectively focusing on decomposed change subsets. In addition to assisting understanding a substantial change, the regression testing selection technique effectively discovers defects by validating each program version that contains a separated change subset. In the evaluation, ChgCutter analyzes 28 composite changes in four open source projects. It identifies related change subsets with 95.7% accuracy, and it selects test cases affected by these changes with 89.0% accuracy. Our results show that ChgCutter should help developers effectively inspect changes and validate modified applications during development
    • …
    corecore