11 research outputs found
Synthesizing framework models for symbolic execution
Symbolic execution is a powerful program analysis technique, but it is difficult to apply to programs built using frameworks such as Swing and Android, because the framework code itself is hard to symbolically execute. The standard solution is to manually create a framework model that can be symbolically executed, but developing and maintaining a model is difficult and error-prone. In this paper, we present Pasket, a new system that takes a first step toward automatically generating Java framework models to support symbolic execution. Pasket's focus is on creating models by instantiating design patterns. Pasket takes as input class, method, and type information from the framework API, together with tutorial programs that exercise the framework. From these artifacts and Pasket's internal knowledge of design patterns, Pasket synthesizes a framework model whose behavior on the tutorial programs matches that of the original framework. We evaluated Pasket by synthesizing models for subsets of Swing and Android. Our results show that the models derived by Pasket are sufficient to allow us to use off-the-shelf symbolic execution tools to analyze Java programs that rely on frameworks.National Science Foundation (U.S.) (CCF-1139021)National Science Foundation (U.S.) (CCF-1139056)National Science Foundation (U.S.) (CCF-1161775
JSKETCH: Sketching for Java
Sketch-based synthesis, epitomized by the SKETCH tool, lets developers
synthesize software starting from a partial program, also called a sketch or
template. This paper presents JSKETCH, a tool that brings sketch-based
synthesis to Java. JSKETCH's input is a partial Java program that may include
holes, which are unknown constants, expression generators, which range over
sets of expressions, and class generators, which are partial classes. JSKETCH
then translates the synthesis problem into a SKETCH problem; this translation
is complex because SKETCH is not object-oriented. Finally, JSKETCH synthesizes
an executable Java program by interpreting the output of SKETCH.Comment: This research was supported in part by NSF CCF-1139021, CCF- 1139056,
CCF-1161775, and the partnership between UMIACS and the Laboratory for
Telecommunication Science
Learning a Static Analyzer from Data
To be practically useful, modern static analyzers must precisely model the
effect of both, statements in the programming language as well as frameworks
used by the program under analysis. While important, manually addressing these
challenges is difficult for at least two reasons: (i) the effects on the
overall analysis can be non-trivial, and (ii) as the size and complexity of
modern libraries increase, so is the number of cases the analysis must handle.
In this paper we present a new, automated approach for creating static
analyzers: instead of manually providing the various inference rules of the
analyzer, the key idea is to learn these rules from a dataset of programs. Our
method consists of two ingredients: (i) a synthesis algorithm capable of
learning a candidate analyzer from a given dataset, and (ii) a counter-example
guided learning procedure which generates new programs beyond those in the
initial dataset, critical for discovering corner cases and ensuring the learned
analysis generalizes to unseen programs.
We implemented and instantiated our approach to the task of learning
JavaScript static analysis rules for a subset of points-to analysis and for
allocation sites analysis. These are challenging yet important problems that
have received significant research attention. We show that our approach is
effective: our system automatically discovered practical and useful inference
rules for many cases that are tricky to manually identify and are missed by
state-of-the-art, manually tuned analyzers
Active Learning of Points-To Specifications
When analyzing programs, large libraries pose significant challenges to
static points-to analysis. A popular solution is to have a human analyst
provide points-to specifications that summarize relevant behaviors of library
code, which can substantially improve precision and handle missing code such as
native code. We propose ATLAS, a tool that automatically infers points-to
specifications. ATLAS synthesizes unit tests that exercise the library code,
and then infers points-to specifications based on observations from these
executions. ATLAS automatically infers specifications for the Java standard
library, and produces better results for a client static information flow
analysis on a benchmark of 46 Android apps compared to using existing
handwritten specifications
Connecting Program Synthesis and Reachability: Automatic Program Repair using Test-Input Generation
We prove that certain formulations of program synthesis and reachability are equivalent. Specifically, our constructive proof shows the reductions between the template-based synthesis problem, which generates a program in a pre-specified form, and the reachability problem, which decides the reachability of a program location. This establishes a link between the two research fields and allows for the transfer of techniques and results between them.
To demonstrate the equivalence, we develop a program repair prototype using reachability tools. We transform a buggy program and its required specification into a specific program containing a location reachable only when the original program can be repaired, and then apply an off-the-shelf test-input generation tool on the transformed program to find test values to reach the desired location. Those test values correspond to repairs for the original program. Preliminary results suggest that our approach compares favorably to other repair methods
A Survey of Symbolic Execution Techniques
Many security and software testing applications require checking whether
certain properties of a program hold for any possible usage scenario. For
instance, a tool for identifying software vulnerabilities may need to rule out
the existence of any backdoor to bypass a program's authentication. One
approach would be to test the program using different, possibly random inputs.
As the backdoor may only be hit for very specific program workloads, automated
exploration of the space of possible inputs is of the essence. Symbolic
execution provides an elegant solution to the problem, by systematically
exploring many possible execution paths at the same time without necessarily
requiring concrete inputs. Rather than taking on fully specified input values,
the technique abstractly represents them as symbols, resorting to constraint
solvers to construct actual instances that would cause property violations.
Symbolic execution has been incubated in dozens of tools developed over the
last four decades, leading to major practical breakthroughs in a number of
prominent software reliability applications. The goal of this survey is to
provide an overview of the main ideas, challenges, and solutions developed in
the area, distilling them for a broad audience.
The present survey has been accepted for publication at ACM Computing
Surveys. If you are considering citing this survey, we would appreciate if you
could use the following BibTeX entry: http://goo.gl/Hf5FvcComment: This is the authors pre-print copy. If you are considering citing
this survey, we would appreciate if you could use the following BibTeX entry:
http://goo.gl/Hf5Fv
Learning to Accelerate Symbolic Execution via Code Transformation
Symbolic execution is an effective but expensive technique for automated test generation. Over the years, a large number of refined symbolic execution techniques have been proposed to improve its efficiency. However, the symbolic execution efficiency problem remains, and largely limits the application of symbolic execution in practice. Orthogonal to refined symbolic execution, in this paper we propose to accelerate symbolic execution through semantic-preserving code transformation on the target programs. During the initial stage of this direction, we adopt a particular code transformation, compiler optimization, which is initially proposed to accelerate program concrete execution by transforming the source program into another semantic-preserving target program with increased efficiency (e.g., faster or smaller). However, compiler optimizations are mostly designed to accelerate program concrete execution rather than symbolic execution. Recent work also reported that unified settings on compiler optimizations that can accelerate symbolic execution for any program do not exist at all. Therefore, in this work we propose a machine-learning based approach to tuning compiler optimizations to accelerate symbolic execution, whose results may also aid further design of specific code transformations for symbolic execution. In particular, the proposed approach LEO separates source-code functions and libraries through our program-splitter, and predicts individual compiler optimization (i.e., whether a type of code transformation is chosen) separately through analyzing the performance of existing symbolic execution. Finally, LEO applies symbolic execution on the code transformed by compiler optimization (through our local-optimizer). We conduct an empirical study on GNU Coreutils programs using the KLEE symbolic execution engine. The results show that LEO significantly accelerates symbolic execution, outperforming the default KLEE configurations (i.e., turning on/off all compiler optimizations) in various settings, e.g., with the default training/testing time, LEO achieves the highest line coverage in 50/68 programs, and its average improvement rate on all programs is 46.48%/88.92% in terms of line coverage compared with turning on/off all compiler optimizations
Lifestate: Event-Driven Protocols and Callback Control Flow
Developing interactive applications (apps) against event-driven software frameworks such as Android is notoriously difficult. To create apps that behave as expected, developers must follow complex and often implicit asynchronous programming protocols. Such protocols intertwine the proper registering of callbacks to receive control from the framework with appropriate application-programming interface (API) calls that in turn affect the set of possible future callbacks. An app violates the protocol when, for example, it calls a particular API method in a state of the framework where such a call is invalid. What makes automated reasoning hard in this domain is largely what makes programming apps against such frameworks hard: the specification of the protocol is unclear, and the control flow is complex, asynchronous, and higher-order. In this paper, we tackle the problem of specifying and modeling event-driven application-programming protocols. In particular, we formalize a core meta-model that captures the dialogue between event-driven frameworks and application callbacks. Based on this meta-model, we define a language called lifestate that permits precise and formal descriptions of application-programming protocols and the callback control flow imposed by the event-driven framework. Lifestate unifies modeling what app callbacks can expect of the framework with specifying rules the app must respect when calling into the framework. In this way, we effectively combine lifecycle constraints and typestate rules. To evaluate the effectiveness of lifestate modeling, we provide a dynamic verification algorithm that takes as input a trace of execution of an app and a lifestate protocol specification to either produce a trace witnessing a protocol violation or a proof that no such trace is realizable