57 research outputs found
Semi-automatic fault localization
One of the most expensive and time-consuming components of the debugging
process is locating the errors or faults. To locate faults, developers must identify
statements involved in failures and select suspicious statements that might contain
faults. In practice, this localization is done by developers in a tedious and manual
way, using only a single execution, targeting only one fault, and having a limited
perspective into a large search space.
The thesis of this research is that fault localization can be partially automated
with the use of commonly available dynamic information gathered from test-case
executions in a way that is effective, efficient, tolerant of test cases that pass but also
execute the fault, and scalable to large programs that potentially contain multiple
faults. The overall goal of this research is to develop effective and efficient fault
localization techniques that scale to programs of large size and with multiple faults.
There are three principle steps performed to reach this goal: (1) Develop practical
techniques for locating suspicious regions in a program; (2) Develop techniques to
partition test suites into smaller, specialized test suites to target specific faults; and
(3) Evaluate the usefulness and cost of these techniques.
In this dissertation, the difficulties and limitations of previous work in the area
of fault-localization are explored. A technique, called Tarantula, is presented that
addresses these difficulties. Empirical evaluation of the Tarantula technique shows
that it is efficient and effective for many faults. The evaluation also demonstrates
that the Tarantula technique can loose effectiveness as the number of faults increases.
To address the loss of effectiveness for programs with multiple faults, supporting
techniques have been developed and are presented. The empirical evaluation of these
supporting techniques demonstrates that they can enable effective fault localization in
the presence of multiple faults. A new mode of debugging, called parallel debugging, is
developed and empirical evidence demonstrates that it can provide a savings in terms
of both total expense and time to delivery. A prototype visualization is provided to
display the fault-localization results as well as to provide a method to interact and
explore those results. Finally, a study on the effects of the composition of test suites
on fault-localization is presented.Ph.D.Committee Chair: Harrold, Mary Jean; Committee Member: Orso, Alessandro; Committee Member: Pande, Santosh; Committee Member: Reiss, Steven; Committee Member: Rugaber, Spence
A Flexible and Non-instrusive Approach for Computing Complex Structural Coverage Metrics
Software analysis tools and techniques often leverage structural code coverage information to reason about the dynamic behavior of software. Existing techniques instrument the code with the required structural obligations and then monitor the execution of the compiled code to report coverage. Instrumentation based approaches often incur considerable runtime overhead for complex structural coverage metrics such as Modified Condition/Decision (MC/DC). Code instrumentation, in general, has to be approached with great care to ensure it does not modify the behavior of the original code. Furthermore, instrumented code cannot be used in conjunction with other analyses that reason about the structure and semantics of the code under test. In this work, we introduce a non-intrusive preprocessing approach for computing structural coverage information. It uses a static partial evaluation of the decisions in the source code and a source-to-bytecode mapping to generate the information necessary to efficiently track structural coverage metrics during execution. Our technique is flexible; the results of the preprocessing can be used by a variety of coverage-driven software analysis tasks, including automated analyses that are not possible for instrumented code. Experimental results in the context of symbolic execution show the efficiency and flexibility of our nonintrusive approach for computing code coverage informatio
Recommended from our members
Combining Static and Dynamic Analysis for Bug Detection and Program Understanding
This work proposes new combinations of static and dynamic analysis for bug detection and program understanding. There are 3 related but largely independent directions: a) In the area of dynamic invariant inference, we improve the consistency of dynamically discovered invariants by taking into account second-order constraints that encode knowledge aboutinvariants; the second-order constraints are either supplied by the programmer or vetted by the programmer (among candidate constraints suggested automatically); b) In the area of testing dataflow (esp. map-reduce) programs, our tool, SEDGE, achieves higher testing coverage by leveraging existinginput data and generalizing them using a symbolic reasoning engine (a powerful SMT solver); c) In the area of bug detection, we identify and present the concept of residual investigation: a dynamic analysis that serves as theruntime agent of a static analysis. Residual investigation identifies with higher certainty whether an error reported by the static analysis is likely true
Towards a Regression Test Selection Technique for Message-Based Software Integration
Regression testing is essential to ensure software quality. Regression Test-case selection is another process wherein, the testers would like to ensure that test-cases which are obsolete due to the changes in the system should not be considered for further testing. This is the Regression Test-case Selection problem. Although existing research has addressed many related problems, most of the existing regression test-case selection techniques cater to procedural systems. Being academic, they lack the scalability and detail to cater to multi-tier applications. Such techniques can be employed for procedural systems, usually mathematical applications. Enterprise applications have become complex and distributed leading to component-based architectures. Thus, inter-process communication has become a very important activity of any such system. Messaging is the most widely employed intermodule interaction mechanism. Today\u27s systems, being heavily internet dependent, are Web-Services based which utilize XML for messaging. We propose an RTS technique which is specifically targeted at enterprise applications
Towards a Regression Test Selection Technique for Message-Based Software Integration
Regression testing is essential to ensure software quality. Regression Test-case selection is another process wherein, the testers would like to ensure that test-cases which are obsolete due to the changes in the system should not be considered for further testing. This is the Regression Test-case Selection problem. Although existing research has addressed many related problems, most of the existing regression test-case selection techniques cater to procedural systems. Being academic, they lack the scalability and detail to cater to multi-tier applications. Such techniques can be employed for procedural systems, usually mathematical applications. Enterprise applications have become complex and distributed leading to component-based architectures. Thus, inter-process communication has become a very important activity of any such system. Messaging is the most widely employed intermodule interaction mechanism. Today\u27s systems, being heavily internet dependent, are Web-Services based which utilize XML for messaging. We propose an RTS technique which is specifically targeted at enterprise applications
Automated concolic testing of smartphone apps
We present an algorithm and a system for generating input events to exercise smartphone apps. Our approach is
based on concolic testing and generates sequences of events
automatically and systematically. It alleviates the path-explosion problem by checking a condition on program executions that identifies subsumption between different event sequences. We also describe our implementation of the approach for Android, the most popular smartphone app platform, and the results of an evaluation that demonstrates its
effectiveness on five Android apps
Source level debugging of dynamically translated programs
The capability to debug a program at the source level is useful and often indispensable. Debuggers usesophisticated techniques to provide a source view of a program, even though what is executing on the hard-ware is machine code. Debugging techniques evolve with significant changes in programming languagesand execution environments. Recently, software dynamic translation (SDT) has emerged as a new execu-tion mechanism. SDT inserts a run-time software layer between the program and the host machine, provid-ing flexibility in execution and program monitoring. Increasingly popular technologies that use thismechanism include dynamic optimization, dynamic instrumentation, security checking, binary translation,and host machine virtualization. However, the run-time program modifications in a SDT environment posesignificant challenges to a source level debugger. Currently debugging techniques do not exist for softwaredynamic translators. This thesis is the first to provide techniques for source level debugging of dynamically translatedprograms. The thesis proposes a novel debugging framework, called Tdb, that addresses the difficult chal-lenge of maintaining and providing source level information for programs whose binary code changes asthe program executes. The proposed framework has a number of important features. First, it does notrequire or induce changes in the program being debugged. In other words, programs are debugged is theirdeployment environment. Second, the framework is portable and can be applied to virtually any SDT sys-tem. The framework requires minimal changes to an SDT implementation, usually just a few lines of code.Third, the framework can be integrated with existing debuggers, such as Gdb, and does not require changesto these debuggers. This improves usability and adoption, eliminating the learning curve associated with anew debugging environment. Finally, the proposed techniques are efficient. The runtime overhead of thedebugged programs is low and comparable to that of existing debuggers. Tdb's techniques have been implemented for three different dynamic translators, on two differenthardware platforms. The experimental results demonstrate that source level debugging of dynamicallytranslated programs is feasible, and our implemented systems are portable, usable, and efficient
Mangrove: an Inference-based Dynamic Invariant Mining for GPU Architectures
Likely invariants model properties that hold in operating conditions of a computing system. Dynamic mining of invariants aims at extracting logic formulas representing such properties from the system execution traces, and it is widely used for verification of intellectual property (IP) blocks. Although the extracted formulas represent likely invariants that hold in the considered traces, there is no guarantee that they are true in general for the system under verification. As a consequence, to increase the probability that the mined invariants are true in general, dynamic mining has to be performed to large sets of representative execution traces. This makes the execution-based mining process of actual IP blocks very time-consuming due to the trace lengths and to the large sets of monitored signals. This article presents extit{Mangrove}, an efficient implementation of a dynamic invariant mining algorithm for GPU architectures. Mangrove exploits inference rules, which are applied at run time to filter invariants from the execution traces and, thus, to sensibly reduce the problem complexity. Mangrove allows users to define invariant templates and, from these templates, it automatically generates kernels for parallel and efficient mining on GPU architectures. The article presents the tool, the analysis of its performance, and its comparison with the best sequential and parallel implementations at the state of the art
Recommended from our members
Making Software More Reliable by Uncovering Hidden Dependencies
As software grows in size and complexity, it also becomes more interdependent. Multiple internal components often share state and data. Whether these dependencies are intentional or not, we have found that their mismanagement often poses several challenges to testing. This thesis seeks to make it easier to create reliable software by making testing more efficient and more effective through explicit knowledge of these hidden dependencies.
The first problem that this thesis addresses, reducing testing time, directly impacts the day-to-day work of every software developer. The frequency with which code can be built (compiled, tested, and package) directly impacts the productivity of developers: longer build times mean a longer wait before determining if a change to the application being build was successful. We have discovered that in the case of some languages, such as Java, the vast majority of build time is spent running tests. Therefore, it's incredibly important to focus on approaches to accelerating testing, while simultaneously making sure that we do not inadvertently cause tests to erratically fail (i.e. become flaky).
Typical techniques for accelerating tests (like running only a subset of them, or running them in parallel) often can't be applied soundly, since there may be hidden dependencies between tests. While we might think that each test should be independent (i.e. that a test's outcome isn't influenced by the execution of another test), we and others have found many examples in real software projects where tests truly have these dependencies: some tests require others to run first, or else their outcome will change. Previous work has shown that these dependencies are often complicated, unintentional, and hidden from developers. We have built several systems, VMVM and ElectricTest, that detect different sorts of dependencies between tests and use that information to soundly reduce testing time by several orders of magnitude.
In our first approach, Unit Test Virtualization, we reduce the overhead of isolating each unit test with a lightweight, virtualization-like container, preventing these dependencies from manifesting. Our realization of Unit Test Virtualization for Java, VMVM eliminates the need to run each test in its own process, reducing test suite execution time by an average of 62% in our evaluation (compared to execution time when running each test in its own process).
However, not all test suites isolate their tests: in some, dependencies are allowed to occur between tests. In these cases, common test acceleration techniques such as test selection or test parallelization are unsound in the absence of dependency information. When dependencies go unnoticed, tests can unexpectedly fail when executed out of order, causing unreliable builds. Our second approach, ElectricTest, soundly identifies data dependencies between test cases, allowing for sound test acceleration.
To enable more broad use of general dependency information for testing and other analyses, we created Phosphor, the first and only portable and performant dynamic taint tracking system for the JVM. Dynamic taint tracking is a form of data flow analysis that applies labels to variables, and tracks all other variables derived from those tagged variables, propagating those tags. Taint tracking has many applications to software engineering and software testing, and in addition to our own work, researchers across the world are using Phosphor to build their own systems. Towards making testing more effective, we also created Pebbles, which makes it easy for developers to specify data-related test oracles on mobile devices by thinking in terms of high level objects such as emails, notes or pictures
- …