917 research outputs found
Semi-automatic fault localization
One of the most expensive and time-consuming components of the debugging
process is locating the errors or faults. To locate faults, developers must identify
statements involved in failures and select suspicious statements that might contain
faults. In practice, this localization is done by developers in a tedious and manual
way, using only a single execution, targeting only one fault, and having a limited
perspective into a large search space.
The thesis of this research is that fault localization can be partially automated
with the use of commonly available dynamic information gathered from test-case
executions in a way that is eļ¬ective, eļ¬cient, tolerant of test cases that pass but also
execute the fault, and scalable to large programs that potentially contain multiple
faults. The overall goal of this research is to develop eļ¬ective and eļ¬cient fault
localization techniques that scale to programs of large size and with multiple faults.
There are three principle steps performed to reach this goal: (1) Develop practical
techniques for locating suspicious regions in a program; (2) Develop techniques to
partition test suites into smaller, specialized test suites to target speciļ¬c faults; and
(3) Evaluate the usefulness and cost of these techniques.
In this dissertation, the diļ¬culties and limitations of previous work in the area
of fault-localization are explored. A technique, called Tarantula, is presented that
addresses these diļ¬culties. Empirical evaluation of the Tarantula technique shows
that it is eļ¬cient and eļ¬ective for many faults. The evaluation also demonstrates
that the Tarantula technique can loose eļ¬ectiveness as the number of faults increases.
To address the loss of eļ¬ectiveness for programs with multiple faults, supporting
techniques have been developed and are presented. The empirical evaluation of these
supporting techniques demonstrates that they can enable eļ¬ective fault localization in
the presence of multiple faults. A new mode of debugging, called parallel debugging, is
developed and empirical evidence demonstrates that it can provide a savings in terms
of both total expense and time to delivery. A prototype visualization is provided to
display the fault-localization results as well as to provide a method to interact and
explore those results. Finally, a study on the eļ¬ects of the composition of test suites
on fault-localization is presented.Ph.D.Committee Chair: Harrold, Mary Jean; Committee Member: Orso, Alessandro; Committee Member: Pande, Santosh; Committee Member: Reiss, Steven; Committee Member: Rugaber, Spence
Applying Formal Methods to Networking: Theory, Techniques and Applications
Despite its great importance, modern network infrastructure is remarkable for
the lack of rigor in its engineering. The Internet which began as a research
experiment was never designed to handle the users and applications it hosts
today. The lack of formalization of the Internet architecture meant limited
abstractions and modularity, especially for the control and management planes,
thus requiring for every new need a new protocol built from scratch. This led
to an unwieldy ossified Internet architecture resistant to any attempts at
formal verification, and an Internet culture where expediency and pragmatism
are favored over formal correctness. Fortunately, recent work in the space of
clean slate Internet design---especially, the software defined networking (SDN)
paradigm---offers the Internet community another chance to develop the right
kind of architecture and abstractions. This has also led to a great resurgence
in interest of applying formal methods to specification, verification, and
synthesis of networking protocols and applications. In this paper, we present a
self-contained tutorial of the formidable amount of work that has been done in
formal methods, and present a survey of its applications to networking.Comment: 30 pages, submitted to IEEE Communications Surveys and Tutorial
Approaches to Interpreter Composition
In this paper, we compose six different Python and Prolog VMs into 4 pairwise
compositions: one using C interpreters; one running on the JVM; one using
meta-tracing interpreters; and one using a C interpreter and a meta-tracing
interpreter. We show that programs that cross the language barrier frequently
execute faster in a meta-tracing composition, and that meta-tracing imposes a
significantly lower overhead on composed programs relative to mono-language
programs.Comment: 33 pages, 1 figure, 9 table
Recommended from our members
Automated synthesis and debugging of declarative models in alloy
In theory, formal specifications offer numerous benefits in developing more reliable software. In practice however, the use of specifications is rather limited, and practitioners often consider them more trouble than they are worth. Indeed, manually writing detailed specifications using notations that have unfamiliar syntax and semantics can be a daunting task -- even for experienced programmers. We introduce a new automated approach for synthesis of desired specifications and debugging of faulty specifications using given examples that capture the essence of desired properties and serve as test cases. Our focus is specifications written in the declarative language Alloy -- a first-order logic based on relations with transitive closure, and its SAT-based analysis engine. Our key insight is that a test-driven foundation enables modern approaches to synthesis and debugging of imperative code to serve as a basis for developing novel analogous techniques for declarative specifications. For synthesis, we build on equivalence in relational algebra and introduce techniques for generating candidate Alloy expressions. We also introduce a technique to complete a partial Alloy model with holes using constraint solving. For locating faults in buggy specifications, we build on mutation-based fault localization and introduce techniques for locating likely faulty nodes in the abstract syntax tree of the faulty specification. Moreover, we integrate our expression generation and fault localization techniques to introduce a technique for automated specification repair. We experimentally evaluate our techniques using several Alloy models as subjects, including those with real faults. The results show that our techniques are effective at synthesis and debugging of the subjects. We believe our techniques provide an important step towards increasing the role of formal specifications in developing more reliable software and realizing their promise.Electrical and Computer Engineerin
Recommended from our members
Enhancing Usability and Explainability of Data Systems
The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, most existing data systems offer limited usability and support for explanations: these systems are usable only by experts with sound technical skills, and even expert users are hindered by the lack of transparency into the systems\u27 inner workings and functions. The aim of my thesis is to bridge the usability gap between nonexpert users and complex data systems, aid all sort of users, including the expert ones, in data and system understanding, and provide explanations that help reason about unexpected outcomes involving data systems. Specifically, my thesis has the following three goals: (1) enhancing usability of data systems for nonexperts, (2) enable data understanding that can assist users in a variety of tasks such as achieving trust in data-driven machine learning, gaining data understanding, and data cleaning, and (3) explaining causes of unexpected outcomes involving data and data systems.
For enhancing usability, we focus on example-driven user intent discovery. We develop systems based on example-driven interactions in two different settings: querying relational databases and personalized document summarization. Towards data understanding, we develop a new data-profiling primitive that can characterize tuples for which a machine-learned model is likely to produce untrustworthy predictions. We also develop an explanation framework to explain causes of such untrustworthy predictions. Additionally, this new data-profiling primitive enables interactive data cleaning. Finally, we develop two explanation frameworks, tailored to provide explanations in debugging data system components, including the data itself. The explanation frameworks focus on explaining the root cause of a concurrent application\u27s intermittent failure and exposing issues in the data that cause a data-driven system to malfunction
Combining Fault Localization with Information Retrieval: an Analysis of Accuracy and Performance for Bug Finding
Debugging is a key activity in the software development process. It has been used extensively by developers to attempt to localize faults, while enhancing the quality and performance of software in general. There has been a significant amount of study in developing and enhancing fault localization techniques, which are used in assisting developers to locate faults within a body of code. However, identifying fault locations using individual techniques is not always effective; combining different techniques, which represent distinct forms of analysis, might help to overcome this issue. There has been a very limited amount of research that suggests that combining more than one approach to fault localization may have benefits, principally because information from different sources is included in the localization process. In this thesis, I attempt to more precisely address the question of whether combining different fault localization techniques can more effectively and efficiently find faults in code, when contrasted with a single technique. To answer this, I have carried out experiments that combine the use of three fault localization techniques: Information Retrieval (IR), Spectrum Based Fault Localization (SBFL), and Text Based Search. These techniques are representative of both dynamic and static fault localization. My hypothesis is that a combination of dynamic and static fault localization analysis can assist developers in better fault localization. I have evaluated the various combinations of techniques in identifying faults against real-world programs, Defects4j, where 395 faults and bug reports have been analyzed. The experimental results demonstrate that the combination of three techniques (SBFL, Text Search, and IR) is the most accurate, with 86.84% accuracy for 343 faults located from a total of 395. This finding contributes positively towards concretely recommending techniques for assisting developers in locating faults in code. Guidelines are provided on which combination of techniques, with maximal accuracy of result, should be applied especially when there is no prior knowledge about the fault
Automating Program Verification and Repair Using Invariant Analysis and Test Input Generation
Software bugs are a persistent feature of daily life---crashing web browsers, allowing cyberattacks, and distorting the results of scientific computations. One approach to improving software uses program invariants---mathematical descriptions of program behaviors---to verify code and detect bugs. Current invariant generation techniques lack support for complex yet important forms of invariants, such as general polynomial relations and properties of arrays. As a result, we lack the ability to conduct precise analysis of programs that use this common data structure. This dissertation presents DIG, a static and dynamic analysis framework for discovering several useful classes of program invariants, including (i) nonlinear polynomial relations, which are fundamental to many scientific applications; disjunctive invariants, (ii) which express branching behaviors in programs; and (iii) properties about multidimensional arrays, which appear in many practical applications. We describe theoretical and empirical results showing that DIG can efficiently and accurately find many important invariants in real-world uses, e.g., polynomial properties in numerical algorithms and array relations in a full AES encryption implementation. Automatic program verification and synthesis are long-standing problems in computer science. However, there has been a lot of work on program verification and less so on program synthesis. Consequently, important synthesis tasks, e.g., generating program repairs, remain difficult and time-consuming. This dissertation proves that certain formulations of verification and synthesis are equivalent, allowing for direct applications of techniques and tools between these two research areas. Based on these ideas, we develop CETI, a tool that leverages existing verification techniques and tools for automatic program repair. Experimental results show that CETI can have higher success rates than many other standard program repair methods
Doctor of Philosophy
dissertationA modern software system is a composition of parts that are themselves highly complex: operating systems, middleware, libraries, servers, and so on. In principle, compositionality of interfaces means that we can understand any given module independently of the internal workings of other parts. In practice, however, abstractions are leaky, and with every generation, modern software systems grow in complexity. Traditional ways of understanding failures, explaining anomalous executions, and analyzing performance are reaching their limits in the face of emergent behavior, unrepeatability, cross-component execution, software aging, and adversarial changes to the system at run time. Deterministic systems analysis has a potential to change the way we analyze and debug software systems. Recorded once, the execution of the system becomes an independent artifact, which can be analyzed offline. The availability of the complete system state, the guaranteed behavior of re-execution, and the absence of limitations on the run-time complexity of analysis collectively enable the deep, iterative, and automatic exploration of the dynamic properties of the system. This work creates a foundation for making deterministic replay a ubiquitous system analysis tool. It defines design and engineering principles for building fast and practical replay machines capable of capturing complete execution of the entire operating system with an overhead of several percents, on a realistic workload, and with minimal installation costs. To enable an intuitive interface of constructing replay analysis tools, this work implements a powerful virtual machine introspection layer that enables an analysis algorithm to be programmed against the state of the recorded system through familiar terms of source-level variable and type names. To support performance analysis, the replay engine provides a faithful performance model of the original execution during replay
- ā¦