15,603 research outputs found
Badger: Complexity Analysis with Fuzzing and Symbolic Execution
Hybrid testing approaches that involve fuzz testing and symbolic execution
have shown promising results in achieving high code coverage, uncovering subtle
errors and vulnerabilities in a variety of software applications. In this paper
we describe Badger - a new hybrid approach for complexity analysis, with the
goal of discovering vulnerabilities which occur when the worst-case time or
space complexity of an application is significantly higher than the average
case. Badger uses fuzz testing to generate a diverse set of inputs that aim to
increase not only coverage but also a resource-related cost associated with
each path. Since fuzzing may fail to execute deep program paths due to its
limited knowledge about the conditions that influence these paths, we
complement the analysis with a symbolic execution, which is also customized to
search for paths that increase the resource-related cost. Symbolic execution is
particularly good at generating inputs that satisfy various program conditions
but by itself suffers from path explosion. Therefore, Badger uses fuzzing and
symbolic execution in tandem, to leverage their benefits and overcome their
weaknesses. We implemented our approach for the analysis of Java programs,
based on Kelinci and Symbolic PathFinder. We evaluated Badger on Java
applications, showing that our approach is significantly faster in generating
worst-case executions compared to fuzzing or symbolic execution on their own
A Compiler and Runtime Infrastructure for Automatic Program Distribution
This paper presents the design and the implementation of a compiler and runtime infrastructure for automatic program distribution. We are building a research infrastructure that enables experimentation with various program partitioning and mapping strategies and the study of automatic distribution's effect on resource consumption (e.g., CPU, memory, communication). Since many optimization techniques are faced with conflicting optimization targets (e.g., memory and communication), we believe that it is important to be able to study their interaction.
We present a set of techniques that enable flexible resource modeling and program distribution. These are: dependence analysis, weighted graph partitioning, code and communication generation, and profiling. We have developed these ideas in the context of the Java language. We present in detail the design and implementation of each of the techniques as part of our compiler and runtime infrastructure. Then, we evaluate our design and present preliminary experimental data for each component, as well as for the entire system
A Practical Blended Analysis for Dynamic Features in JavaScript
The JavaScript Blended Analysis Framework is designed to
perform a general-purpose, practical combined static/dynamic
analysis of JavaScript programs, while handling dynamic
features such as run-time generated code and variadic func-
tions. The idea of blended analysis is to focus static anal-
ysis on a dynamic calling structure collected at runtime in
a lightweight manner, and to rene the static analysis us-
ing additional dynamic information. We perform blended
points-to analysis of JavaScript with our framework and
compare results with those computed by a pure static points-
to analysis. Using JavaScript codes from actual webpages
as benchmarks, we show that optimized blended analysis
for JavaScript obtains good coverage (86.6% on average per
website) of the pure static analysis solution and nds ad-
ditional points-to pairs (7.0% on average per website) con-
tributed by dynamically generated/loaded code
A Survey of Symbolic Execution Techniques
Many security and software testing applications require checking whether
certain properties of a program hold for any possible usage scenario. For
instance, a tool for identifying software vulnerabilities may need to rule out
the existence of any backdoor to bypass a program's authentication. One
approach would be to test the program using different, possibly random inputs.
As the backdoor may only be hit for very specific program workloads, automated
exploration of the space of possible inputs is of the essence. Symbolic
execution provides an elegant solution to the problem, by systematically
exploring many possible execution paths at the same time without necessarily
requiring concrete inputs. Rather than taking on fully specified input values,
the technique abstractly represents them as symbols, resorting to constraint
solvers to construct actual instances that would cause property violations.
Symbolic execution has been incubated in dozens of tools developed over the
last four decades, leading to major practical breakthroughs in a number of
prominent software reliability applications. The goal of this survey is to
provide an overview of the main ideas, challenges, and solutions developed in
the area, distilling them for a broad audience.
The present survey has been accepted for publication at ACM Computing
Surveys. If you are considering citing this survey, we would appreciate if you
could use the following BibTeX entry: http://goo.gl/Hf5FvcComment: This is the authors pre-print copy. If you are considering citing
this survey, we would appreciate if you could use the following BibTeX entry:
http://goo.gl/Hf5Fv
Heap Abstractions for Static Analysis
Heap data is potentially unbounded and seemingly arbitrary. As a consequence,
unlike stack and static memory, heap memory cannot be abstracted directly in
terms of a fixed set of source variable names appearing in the program being
analysed. This makes it an interesting topic of study and there is an abundance
of literature employing heap abstractions. Although most studies have addressed
similar concerns, their formulations and formalisms often seem dissimilar and
some times even unrelated. Thus, the insights gained in one description of heap
abstraction may not directly carry over to some other description. This survey
is a result of our quest for a unifying theme in the existing descriptions of
heap abstractions. In particular, our interest lies in the abstractions and not
in the algorithms that construct them.
In our search of a unified theme, we view a heap abstraction as consisting of
two features: a heap model to represent the heap memory and a summarization
technique for bounding the heap representation. We classify the models as
storeless, store based, and hybrid. We describe various summarization
techniques based on k-limiting, allocation sites, patterns, variables, other
generic instrumentation predicates, and higher-order logics. This approach
allows us to compare the insights of a large number of seemingly dissimilar
heap abstractions and also paves way for creating new abstractions by
mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure
Interprocedural Data Flow Analysis in Soot using Value Contexts
An interprocedural analysis is precise if it is flow sensitive and fully
context-sensitive even in the presence of recursion. Many methods of
interprocedural analysis sacrifice precision for scalability while some are
precise but limited to only a certain class of problems.
Soot currently supports interprocedural analysis of Java programs using graph
reachability. However, this approach is restricted to IFDS/IDE problems, and is
not suitable for general data flow frameworks such as heap reference analysis
and points-to analysis which have non-distributive flow functions.
We describe a general-purpose interprocedural analysis framework for Soot
using data flow values for context-sensitivity. This framework is not
restricted to problems with distributive flow functions, although the lattice
must be finite. It combines the key ideas of the tabulation method of the
functional approach and the technique of value-based termination of call string
construction.
The efficiency and precision of interprocedural analyses is heavily affected
by the precision of the underlying call graph. This is especially important for
object-oriented languages like Java where virtual method invocations cause an
explosion of spurious call edges if the call graph is constructed naively. We
have instantiated our framework with a flow and context-sensitive points-to
analysis in Soot, which enables the construction of call graphs that are far
more precise than those constructed by Soot's SPARK engine.Comment: SOAP 2013 Final Versio
AMaĻoSāAbstract Machine for Xcerpt
Web query languages promise convenient and efficient access
to Web data such as XML, RDF, or Topic Maps. Xcerpt is one such Web
query language with strong emphasis on novel high-level constructs for
effective and convenient query authoring, particularly tailored to versatile
access to data in different Web formats such as XML or RDF.
However, so far it lacks an efficient implementation to supplement the
convenient language features. AMaĻoS is an abstract machine implementation
for Xcerpt that aims at efficiency and ease of deployment. It
strictly separates compilation and execution of queries: Queries are compiled
once to abstract machine code that consists in (1) a code segment
with instructions for evaluating each rule and (2) a hint segment that
provides the abstract machine with optimization hints derived by the
query compilation. This article summarizes the motivation and principles
behind AMaĻoS and discusses how its current architecture realizes
these principles
Heap Reference Analysis Using Access Graphs
Despite significant progress in the theory and practice of program analysis,
analysing properties of heap data has not reached the same level of maturity as
the analysis of static and stack data. The spatial and temporal structure of
stack and static data is well understood while that of heap data seems
arbitrary and is unbounded. We devise bounded representations which summarize
properties of the heap data. This summarization is based on the structure of
the program which manipulates the heap. The resulting summary representations
are certain kinds of graphs called access graphs. The boundedness of these
representations and the monotonicity of the operations to manipulate them make
it possible to compute them through data flow analysis.
An important application which benefits from heap reference analysis is
garbage collection, where currently liveness is conservatively approximated by
reachability from program variables. As a consequence, current garbage
collectors leave a lot of garbage uncollected, a fact which has been confirmed
by several empirical studies. We propose the first ever end-to-end static
analysis to distinguish live objects from reachable objects. We use this
information to make dead objects unreachable by modifying the program. This
application is interesting because it requires discovering data flow
information representing complex semantics. In particular, we discover four
properties of heap data: liveness, aliasing, availability, and anticipability.
Together, they cover all combinations of directions of analysis (i.e. forward
and backward) and confluence of information (i.e. union and intersection). Our
analysis can also be used for plugging memory leaks in C/C++ languages.Comment: Accepted for printing by ACM TOPLAS. This version incorporates
referees' comment
- ā¦