36 research outputs found

    Poster: Debugging Inputs

    Get PDF
    Program failures are often caused by invalid inputs, for instance due to input corruption. To obtain the passing input, one needs to debug the data. In this paper we present a generic technique called ddmax that (1) identifies which parts of the input data prevent processing, and (2) recovers as much of the (valuable) input data as possible. To the best of our knowledge, ddmax is the first approach that fixes faults in the input data without requiring program analysis. In our evaluation, ddmax repaired about 69% of input files and recovered about 78% of data within one minute per input

    Locating Faults with Program Slicing: An Empirical Analysis

    Get PDF
    Statistical fault localization is an easily deployed technique for quickly determining candidates for faulty code locations. If a human programmer has to search the fault beyond the top candidate locations, though, more traditional techniques of following dependencies along dynamic slices may be better suited. In a large study of 457 bugs (369 single faults and 88 multiple faults) in 46 open source C programs, we compare the effectiveness of statistical fault localization against dynamic slicing. For single faults, we find that dynamic slicing was eight percentage points more effective than the best performing statistical debugging formula; for 66% of the bugs, dynamic slicing finds the fault earlier than the best performing statistical debugging formula. In our evaluation, dynamic slicing is more effective for programs with single fault, but statistical debugging performs better on multiple faults. Best results, however, are obtained by a hybrid approach: If programmers first examine at most the top five most suspicious locations from statistical debugging, and then switch to dynamic slices, on average, they will need to examine 15% (30 lines) of the code. These findings hold for 18 most effective statistical debugging formulas and our results are independent of the number of faults (i.e. single or multiple faults) and error type (i.e. artificial or real errors)

    Locating Faults with Program Slicing: An Empirical Analysis

    Get PDF
    Statistical fault localization is an easily deployed technique for quickly determining candidates for faulty code locations. If a human programmer has to search the fault beyond the top candidate locations, though, more traditional techniques of following dependencies along dynamic slices may be better suited. In a large study of 457 bugs (369 single faults and 88 multiple faults) in 46 open-source C programs, we compare the effectiveness of statistical fault localization against dynamic slicing. For single faults, we find that dynamic slicing was eight percentage points more effective than the best per- forming statistical debugging formula; for 66% of the bugs, dynamic slicing finds the fault earlier than the best performing statistical debugging formula. In our evaluation, dynamic slicing is more effective for programs with single fault, but statistical debugging performs better on multiple faults. Best results, however, are obtained by a hybrid approach: If programmers first examine at most the top five most suspicious locations from statistical debugging, and then switch to dynamic slices, on average, they will need to examine 15% (30 lines) of the code. These findings hold for 18 most effective statistical debugging formulas and our results are independent of the number of faults (i.e. single or multiple faults) and error type (i.e. artificial or real errors)

    Directed Grammar-Based Test Generation - Replication Package

    No full text
    Artifact Structure:This artifact consists of a project archive `TSE-Project.tar.xz` and a results archive `TSE-Results.tar.xz`. To evaluate the artifact, please first extract the project archive into a folder of your choice. It contains all scripts, Dockerfiles and executables required to run the evaluation, as well as the generated graphs and tables from the evaluation inside the `project/out` directory.To build and run the Dockerfile, please consult the "Building and testing the Dockerfile on your local machine" section of the README.md file that is also contained inside the project artifact. We included all intermediate results of our evaluation inside the `TSE-Results.tar.xz` archive. To evaluate those files (including the inputs generated during our evaluation), please extract this archive into the `project/workingdir` directory. It required about 50GiB of disk space.Paper Abstract:Context: To effectively test complex software, it is important to generate goal-specific inputs, i.e., inputs that achieve a specific testing goal. For instance, developers may intend to target one or more testing goal(s) during testing – generate complex inputs or trigger new or error-prone behaviors. Problem: However, most state-of-the-art test generators are not designed to target specific goals. Notably, grammar-based test generators, which (randomly) produce syntactically valid inputs via an input specification (i.e., grammar) have a low probability of achieving an arbitrary testing goal. Aim: This work addresses this challenge by proposing an automated test generation approach (called FDLOOP) which iteratively learns relevant input properties from existing inputs to drive the generation of goal-specific inputs. Method: The main idea of our approach is to leverage test feedback to generate goal-specific inputs via a combination of evolutionary testing and grammar learning. FDLOOP automatically learns a mapping between input structures and a specific testing goal, such mappings allow to generate inputs that target the goal-at-hand. Given a testing goal, FDLOOP iteratively selects, evolves and learn the input distribution of goal-specific test inputs via test feedback and a probabilistic grammar. We concretize FDLOOP for four testing goals, namely unique code coverage, input-to-code complexity, program failures (exceptions) and long execution time. We evaluate FDLOOP using three (3) well-known input formats (JSON, CSS and JavaScript) and 20 open-source software. Results: FDLOOP is up to 89% more effective than the baseline grammar-based test generators (i.e., random, probabilistic and inverse-probabilistic methods) and it outperforms the closest state-of-the-art approach (EvoGfuzz) by up to 77%. In addition, we show that the main components of FDLOOP (i.e., input mutator and grammar mutator) contribute positively to the effectiveness of our approach. We also observed that FDLOOP is effective across varying parameter settings – the number of initial seed inputs, the number of generated inputs, and the number of input generations. Implications: Finally, our evaluation demonstrates that FDLOOP is effective for targeting a specific testing goal – revealing error-prone behaviors, generating complex inputs, or producing inputs with long execution time – and scales to multiple testing goals.</p

    Customized Software Environment for Remote Learning: Providing Students a Specialized Learning Experience

    No full text
    The Covid-19 pandemic has challenged educators across the world to move their teaching and mentoring from in-person to remote. During nonpandemic semesters at their institutes (e.g. universities), educators can directly provide students the software environment needed to support their learning - either in specialized computer laboratories (e.g. computational chemistry labs) or shared computer spaces. These labs are often supported by staff that maintains the operating systems (OS) and software. But how does one provide a specialized software environment for remote teaching? One solution is to provide students a customized operating system (e.g., Linux) that includes open-source software for supporting your teaching goals. However, such a solution should not require students to install the OS alongside their existing one (i.e. dual/multi-booting) or be used as a complete replacement. Such approaches are risky because of a) the students\u27 possible lack of software expertise, b) the possible disruption of an existing software workflow that is needed in other classes or by other family members, and c) the importance of maintaining a working computer when isolated (e.g. societal restrictions). To illustrate possible solutions, we discuss our approach that used a customized Linux OS and a Docker container in a course that teaches computational chemistry and Python3

    Customized Software Environment for Remote Learning: Providing Students a Specialized Learning Experience

    No full text
    The Covid-19 pandemic has challenged educators across the world to move their teaching and mentoring from in-person to remote. During nonpandemic semesters at their institutes (e.g. universities), educators can directly provide students the software environment needed to support their learning - either in specialized computer laboratories (e.g. computational chemistry labs) or shared computer spaces. These labs are often supported by staff that maintains the operating systems (OS) and software. But how does one provide a specialized software environment for remote teaching? One solution is to provide students a customized operating system (e.g., Linux) that includes open-source software for supporting your teaching goals. However, such a solution should not require students to install the OS alongside their existing one (i.e. dual/multi-booting) or be used as a complete replacement. Such approaches are risky because of a) the students' possible lack of software expertise, b) the possible disruption of an existing software workflow that is needed in other classes or by other family members, and c) the importance of maintaining a working computer when isolated (e.g. societal restrictions). To illustrate possible solutions, we discuss our approach that used a customized Linux OS and a Docker container in a course that teaches computational chemistry and Python3

    Debugging Assumptions Artifact

    No full text
    Artifact for "Evaluating the Impact of Experimental Assumptions in Automated Fault Localization", accepted at ICSE 2023 Technical Track.</p

    Mining Input Grammars from Dynamic Control Flow

    Get PDF
    One of the key properties of a program is its input specification. Having a formal input specification can be critical in fields such as vulnerability analysis, reverse engineering, software testing, clone detection, or refactoring. Unfortunately, accurate input specifications for typical programs are often unavailable or out of date. In this paper, we present a general algorithm that takes a program and a small set of sample inputs and automatically infers a readable context-free grammar capturing the input language of the program. We infer the syntactic input structure only by observing access of input characters at different locations of the input parser. This works on all stack based recursive descent input parsers, including parser combinators, and works entirely without program specific heuristics. Our Mimid prototype produced accurate and readable grammars for a variety of evaluation subjects, including complex languages such as JSON, TinyC, and JavaScript
    corecore