836 research outputs found
UNIT-LEVEL ISOLATION AND TESTING OF BUGGY CODE
In real-world software development, maintenance plays a major role and developers spend 50-80% of their time in maintenance-related activities. During software maintenance, a significant amount of effort is spent on ending and fixing bugs. In some cases, the fix does not completely eliminate the buggy behavior; though it addresses the reported problem, it fails to account for conditions that could lead to similar failures. There could be many possible reasons: the conditions may have been overlooked or difficult to reproduce, e.g., when the components that invoke the code or the underlying components it interacts with can not put it in a state where latent errors appear. We posit that such latent errors can be discovered sooner if the buggy section can be tested more thoroughly in a separate environment, a strategy that is loosely analogous to the medical procedure of performing a biopsy where tissue is removed, examined and subjected to a battery of tests to determine the presence of a disease.
In this thesis, we propose a process in which the buggy code is extracted and isolated in a test framework. Test drivers and stubs are added to exercise the code and observe its interactions with its dependencies. We lay the groundwork for the creation of an automated tool for isolating code by studying its feasibility and investigating existing testing technologies that can facilitate the creation of such drivers and stubs. We investigate mocking frameworks, symbolic execution and model checking tools and test their capabilities by examining real bugs from the Apache Tomcat project. We demonstrate the merits of performing unit-level symbolic execution and model checking to discover runtime exceptions and logical errors. The process is shown to have high coverage and able to uncover latent errors due to insufficient fixes
Large-Scale Identification and Analysis of Factors Impacting Simple Bug Resolution Times in Open Source Software Repositories
One of the most prominent issues the ever-growing open-source software community faces is the abundance of buggy code. Well-established version control systems and repository hosting services such as GitHub and Maven provide a checks-and-balances structure to minimize the amount of buggy code introduced. Although these platforms are effective in mitigating the problem, it still remains. To further the efforts toward a more effective and quicker response to bugs, we must understand the factors that affect the time it takes to fix one. We apply a custom traversal algorithm to commits made for open source repositories to determine when “simple stupid bugs” were first introduced to projects and explore the factors that drive the time it takes to fix them. Using the commit history from the main development branch, we are able to identify the commit that first introduced 13 different types of simple stupid bugs in 617 of the top Java projects on GitHub. Leveraging a statistical survival model and other non-parametric statistical tests, we found that there were two main categories of categorical variables that affect a bug’s life; Time Factors and Author Factors. We find that bugs are fixed quicker if they are introduced and resolved by the same developer. Further, we discuss how the day of the week and time of day a buggy code was written and fixed affects its resolution time. These findings will provide vital insight to help the open-source community mitigate the abundance of code and can be used in future research to aid in bug-finding programs
FixEval: Execution-based Evaluation of Program Fixes for Programming Problems
The increasing complexity of software has led to a drastic rise in time and
costs for identifying and fixing bugs. Various approaches are explored in the
literature to generate fixes for buggy code automatically. However, few tools
and datasets are available to evaluate model-generated fixes effectively due to
the large combinatorial space of possible fixes for a particular bug. In this
work, we introduce FIXEVAL, a benchmark comprising buggy code submissions to
competitive programming problems and their respective fixes. FIXEVAL is
composed of a rich test suite to evaluate and assess the correctness of
model-generated program fixes and further information regarding time and memory
constraints and acceptance based on a verdict. We consider two Transformer
language models pretrained on programming languages as our baselines and
compare them using match-based and execution-based evaluation metrics. Our
experiments show that match-based metrics do not reflect model-generated
program fixes accurately. At the same time, execution-based methods evaluate
programs through all cases and scenarios designed explicitly for that solution.
Therefore, we believe FIXEVAL provides a step towards real-world automatic bug
fixing and model-generated code evaluation. The dataset and models are
open-sourced.\footnote{\url{https://github.com/mahimanzum/FixEval}
Conceptual model for software fault localization
Existing cognitive science and psychology studies suggest that a bi-level approach to fault localization is needed with both shallow and deep reasoning. This approach form the underpinnings for developing our Conceptual Model for Software Fault Localization (CMSFL) to aid programmers with the problem of software fault localization. Our CMSFL proposes that, during the fault localization process programmers build two mental models: an actual code model (the buggy code), and an expectation model (the correct code). A multi dimensional approach is suggested with both shallow and deep reasoning phases to enhance the probability of localizing many types of faults
Explainable Automated Debugging via Large Language Model-driven Scientific Debugging
Automated debugging techniques have the potential to reduce developer effort
in debugging, and have matured enough to be adopted by industry. However, one
critical issue with existing techniques is that, while developers want
rationales for the provided automatic debugging results, existing techniques
are ill-suited to provide them, as their deduction process differs
significantly from that of human developers. Inspired by the way developers
interact with code when debugging, we propose Automated Scientific Debugging
(AutoSD), a technique that given buggy code and a bug-revealing test, prompts
large language models to automatically generate hypotheses, uses debuggers to
actively interact with buggy code, and thus automatically reach conclusions
prior to patch generation. By aligning the reasoning of automated debugging
more closely with that of human developers, we aim to produce intelligible
explanations of how a specific patch has been generated, with the hope that the
explanation will lead to more efficient and accurate developer decisions. Our
empirical analysis on three program repair benchmarks shows that AutoSD
performs competitively with other program repair baselines, and that it can
indicate when it is confident in its results. Furthermore, we perform a human
study with 20 participants, including six professional developers, to evaluate
the utility of explanations from AutoSD. Participants with access to
explanations could judge patch correctness in roughly the same time as those
without, but their accuracy improved for five out of six real-world bugs
studied: 70% of participants answered that they wanted explanations when using
repair tools, while 55% answered that they were satisfied with the Scientific
Debugging presentation
- …