17 research outputs found
Enhancing Reuse of Constraint Solutions to Improve Symbolic Execution
Constraint solution reuse is an effective approach to save the time of
constraint solving in symbolic execution. Most of the existing reuse approaches
are based on syntactic or semantic equivalence of constraints; e.g. the Green
framework is able to reuse constraints which have different representations but
are semantically equivalent, through canonizing constraints into syntactically
equivalent normal forms. However, syntactic/semantic equivalence is not a
necessary condition for reuse--some constraints are not syntactically or
semantically equivalent, but their solutions still have potential for reuse.
Existing approaches are unable to recognize and reuse such constraints.
In this paper, we present GreenTrie, an extension to the Green framework,
which supports constraint reuse based on the logical implication relations
among constraints. GreenTrie provides a component, called L-Trie, which stores
constraints and solutions into tries, indexed by an implication partial order
graph of constraints. L-Trie is able to carry out logical reduction and logical
subset and superset querying for given constraints, to check for reuse of
previously solved constraints. We report the results of an experimental
assessment of GreenTrie against the original Green framework, which shows that
our extension achieves better reuse of constraint solving result and saves
significant symbolic execution time.Comment: this paper has been submitted to conference ISSTA 201
Eventually Sound Points-To Analysis with Specifications
Static analyses make the increasingly tenuous assumption that all source code is available for analysis; for example, large libraries often call into native code that cannot be analyzed. We propose a points-to analysis that initially makes optimistic assumptions about missing code, and then inserts runtime checks that report counterexamples to these assumptions that occur during execution. Our approach guarantees eventual soundness, which combines two guarantees: (i) the runtime checks are guaranteed to catch the first counterexample that occurs during any execution, in which case execution can be terminated to prevent harm, and (ii) only finitely many counterexamples ever occur, implying that the static analysis eventually becomes statically sound with respect to all remaining executions. We implement Optix, an eventually sound points-to analysis for Android apps, where the Android framework is missing. We show that the runtime checks added by Optix incur low overhead on real programs, and demonstrate how Optix improves a client information flow analysis for detecting Android malware
Automated concolic testing of smartphone apps
We present an algorithm and a system for generating input events to exercise smartphone apps. Our approach is
based on concolic testing and generates sequences of events
automatically and systematically. It alleviates the path-explosion problem by checking a condition on program executions that identifies subsumption between different event sequences. We also describe our implementation of the approach for Android, the most popular smartphone app platform, and the results of an evaluation that demonstrates its
effectiveness on five Android apps
Towards optimal concolic testing
ACM Distinguished Paper Award</p
Targeted Greybox Fuzzing with Static Lookahead Analysis
Automatic test generation typically aims to generate inputs that explore new
paths in the program under test in order to find bugs. Existing work has,
therefore, focused on guiding the exploration toward program parts that are
more likely to contain bugs by using an offline static analysis.
In this paper, we introduce a novel technique for targeted greybox fuzzing
using an online static analysis that guides the fuzzer toward a set of target
locations, for instance, located in recently modified parts of the program.
This is achieved by first semantically analyzing each program path that is
explored by an input in the fuzzer's test suite. The results of this analysis
are then used to control the fuzzer's specialized power schedule, which
determines how often to fuzz inputs from the test suite. We implemented our
technique by extending a state-of-the-art, industrial fuzzer for Ethereum smart
contracts and evaluate its effectiveness on 27 real-world benchmarks. Using an
online analysis is particularly suitable for the domain of smart contracts
since it does not require any code instrumentation---instrumentation to
contracts changes their semantics. Our experiments show that targeted fuzzing
significantly outperforms standard greybox fuzzing for reaching 83% of the
challenging target locations (up to 14x of median speed-up)
Sound regular expression semantics for dynamic symbolic execution of JavaScript
Existing support for regular expressions in automated test generation or
verification tools is lacking. Common aspects of regular expression engines
found in mainstream programming languages, such as backreferences or greedy
matching, are commonly ignored or imprecisely approximated, leading to poor
test coverage or failed proofs. In this paper, we present the first complete
strategy to faithfully reason about regular expressions in the context of
symbolic execution, focusing on the operators found in JavaScript. We model
regular expression operations using string constraints and classical regular
expressions and use a refinement scheme to address the problem of matching
precedence and greediness. Our survey of over 400,000 JavaScript packages from
the NPM software repository shows that one fifth make use of complex regular
expressions features. We implemented our model in a dynamic symbolic execution
engine for JavaScript and evaluated it on over 1,000 Node.js packages
containing regular expressions, demonstrating that the strategy is effective
and can increase line coverage of programs by up to 30%Comment: This arXiv version (v4) contains fixes for some typographical errors
of the PLDI'19 version (the numbering of indices in Section 4.1 and the
example in Section 4.3
Techniques to facilitate symbolic execution of real-world programs
The overall goal of this research is to reduce the cost of software development and improve the quality of software. Symbolic execution is a program-analysis technique that is used to address several problems that arise in developing high-quality software. Despite the fact that the symbolic execution technique is well understood, and performing symbolic execution on simple programs is straightforward, it is still not possible to apply the technique to the general class of large, real-world software. A symbolic-execution system can be effectively applied to large, real-world software if it has at least the two features: efficiency and automation. However, efficient and automatic symbolic execution of real-world programs is a lofty goal because of both theoretical and practical reasons. Theoretically, achieving this goal requires solving an intractable problem (i.e., solving constraints). Practically, achieving this goal requires overwhelming effort to implement a symbolic-execution system that can precisely and automatically symbolically execute real-world programs.
This research makes three major contributions.
1. Three new techniques that address three important problems of symbolic execution. Compared to existing techniques, the new techniques
* reduce the manual effort that may be required to symbolically execute those programs that either generate complex constraints or parts of which cannot be symbolically executed due to limitations of a symbolic-execution system.
* improve the usefulness of symbolic execution (e.g., expose more bugs in a program) by enabling discovery of more feasible paths within a given time budget.
2. A novel approach that uses symbolic execution to generate test inputs for Apps that run on modern mobile devices such as smartphones and tablets.
3. Implementations of the above techniques and empirical results obtained from applying those techniques to real-world programs that demonstrate their effectiveness.PhDCommittee Chair: Mary Jean Harrold; Committee Member: Alessandro Orso; Committee Member: Mayur Naik; Committee Member: Santosh Pande; Committee Member: Willem Visse
Precise interface identification to improve testing and analysis of web applications
As web applications become more widespread, sophisticated, and complex, automated quality assurance techniques for such applications have grown in importance. Accurate interface identification is fundamental for many of these techniques, as the components of a web application communicate extensively via implicitly-defined interfaces to generate customized and dynamic content. However, current techniques for identifying web application interfaces can be incomplete or imprecise, which hinders the effectiveness of quality assurance techniques. To address these limitations, we present a new approach for identifying web application interfaces that is based on a specialized form of symbolic execution. In our empirical evaluation, we show that the set of interfaces identified by our approach is more accurate than those identified by other approaches. We also show that this increased accuracy leads to improvements in several important quality assurance techniques for web applications: test-input generation, penetration testing, and invocation verification