135 research outputs found
Interprocedural Data Flow Analysis in Soot using Value Contexts
An interprocedural analysis is precise if it is flow sensitive and fully
context-sensitive even in the presence of recursion. Many methods of
interprocedural analysis sacrifice precision for scalability while some are
precise but limited to only a certain class of problems.
Soot currently supports interprocedural analysis of Java programs using graph
reachability. However, this approach is restricted to IFDS/IDE problems, and is
not suitable for general data flow frameworks such as heap reference analysis
and points-to analysis which have non-distributive flow functions.
We describe a general-purpose interprocedural analysis framework for Soot
using data flow values for context-sensitivity. This framework is not
restricted to problems with distributive flow functions, although the lattice
must be finite. It combines the key ideas of the tabulation method of the
functional approach and the technique of value-based termination of call string
construction.
The efficiency and precision of interprocedural analyses is heavily affected
by the precision of the underlying call graph. This is especially important for
object-oriented languages like Java where virtual method invocations cause an
explosion of spurious call edges if the call graph is constructed naively. We
have instantiated our framework with a flow and context-sensitive points-to
analysis in Soot, which enables the construction of call graphs that are far
more precise than those constructed by Soot's SPARK engine.Comment: SOAP 2013 Final Versio
Heap Reference Analysis Using Access Graphs
Despite significant progress in the theory and practice of program analysis,
analysing properties of heap data has not reached the same level of maturity as
the analysis of static and stack data. The spatial and temporal structure of
stack and static data is well understood while that of heap data seems
arbitrary and is unbounded. We devise bounded representations which summarize
properties of the heap data. This summarization is based on the structure of
the program which manipulates the heap. The resulting summary representations
are certain kinds of graphs called access graphs. The boundedness of these
representations and the monotonicity of the operations to manipulate them make
it possible to compute them through data flow analysis.
An important application which benefits from heap reference analysis is
garbage collection, where currently liveness is conservatively approximated by
reachability from program variables. As a consequence, current garbage
collectors leave a lot of garbage uncollected, a fact which has been confirmed
by several empirical studies. We propose the first ever end-to-end static
analysis to distinguish live objects from reachable objects. We use this
information to make dead objects unreachable by modifying the program. This
application is interesting because it requires discovering data flow
information representing complex semantics. In particular, we discover four
properties of heap data: liveness, aliasing, availability, and anticipability.
Together, they cover all combinations of directions of analysis (i.e. forward
and backward) and confluence of information (i.e. union and intersection). Our
analysis can also be used for plugging memory leaks in C/C++ languages.Comment: Accepted for printing by ACM TOPLAS. This version incorporates
referees' comment
Enforcing Programming Guidelines with Region Types and Effects
We present in this paper a new type and effect system for Java which can be
used to ensure adherence to guidelines for secure web programming. The system
is based on the region and effect system by Beringer, Grabowski, and Hofmann.
It improves upon it by being parametrized over an arbitrary guideline supplied
in the form of a finite monoid or automaton and a type annotation or mockup
code for external methods. Furthermore, we add a powerful type inference based
on precise interprocedural analysis and provide an implementation in the Soot
framework which has been tested on a number of benchmarks including large parts
of the Stanford SecuriBench.Comment: long version of APLAS'17 pape
Program Tailoring: Slicing by Sequential Criteria
Protocol and typestate analyses often report some sequences of
statements ending at a program point P that needs to be
scrutinized, since P may be erroneous or imprecisely
analyzed. Program slicing focuses only on the behavior at P by
computing a slice of the program affecting the values at P. In
this paper, we propose to restrict our attention to the subset of
that behavior at P affected by one or several statement
sequences, called a sequential criterion (SC). By leveraging the
ordering information in a SC, e.g., the temporal order in a few
valid/invalid API method invocation sequences, we introduce a
new technique, program tailoring, to compute a tailored program
that comprises the statements in all possible execution paths
passing through at least one sequence in SC in the given
order. With a prototyping implementation, Tailor, we show why
tailoring is practically useful by conducting two case studies on
seven large real-world Java applications. For program
debugging and understanding, Tailor can complement program
slicing by removing SC-irrelevant statements. For program
analysis, Tailor can enable a pointer analysis, which is
unscalable to a program, to perform a more focused and therefore
potentially scalable analysis to its specific parts containing
hard language features such as reflection
Sound and Precise Malware Analysis for Android via Pushdown Reachability and Entry-Point Saturation
We present Anadroid, a static malware analysis framework for Android apps.
Anadroid exploits two techniques to soundly raise precision: (1) it uses a
pushdown system to precisely model dynamically dispatched interprocedural and
exception-driven control-flow; (2) it uses Entry-Point Saturation (EPS) to
soundly approximate all possible interleavings of asynchronous entry points in
Android applications. (It also integrates static taint-flow analysis and least
permissions analysis to expand the class of malicious behaviors which it can
catch.) Anadroid provides rich user interface support for human analysts which
must ultimately rule on the "maliciousness" of a behavior.
To demonstrate the effectiveness of Anadroid's malware analysis, we had teams
of analysts analyze a challenge suite of 52 Android applications released as
part of the Auto- mated Program Analysis for Cybersecurity (APAC) DARPA
program. The first team analyzed the apps using a ver- sion of Anadroid that
uses traditional (finite-state-machine-based) control-flow-analysis found in
existing malware analysis tools; the second team analyzed the apps using a
version of Anadroid that uses our enhanced pushdown-based
control-flow-analysis. We measured machine analysis time, human analyst time,
and their accuracy in flagging malicious applications. With pushdown analysis,
we found statistically significant (p < 0.05) decreases in time: from 85
minutes per app to 35 minutes per app in human plus machine analysis time; and
statistically significant (p < 0.05) increases in accuracy with the
pushdown-driven analyzer: from 71% correct identification to 95% correct
identification.Comment: Appears in 3rd Annual ACM CCS workshop on Security and Privacy in
SmartPhones and Mobile Devices (SPSM'13), Berlin, Germany, 201
Detecting Semantic Conflicts using Static Analysis
Version control system tools empower developers to independently work on
their development tasks. These tools also facilitate the integration of changes
through merging operations, and report textual conflicts. However, when
developers integrate their changes, they might encounter other types of
conflicts that are not detected by current merge tools. In this paper, we focus
on dynamic semantic conflicts, which occur when merging reports no textual
conflicts but results in undesired interference - causing unexpected program
behavior at runtime. To address this issue, we propose a technique that
explores the use of static analysis to detect interference when merging
contributions from two developers. We evaluate our technique using a dataset of
99 experimental units extracted from merge scenarios. The results provide
evidence that our technique presents significant interference detection
capability. It outperforms, in terms of F1 score and recall, previous methods
that rely on dynamic analysis for detecting semantic conflicts, but these show
better precision. Our technique precision is comparable to the ones observed in
other studies that also leverage static analysis or use theorem proving
techniques to detect semantic conflicts, albeit with significantly improved
overall performance
Boomerang: Demand-Driven Flow- and Context-Sensitive Pointer Analysis for Java
Many current program analyses require highly precise pointer
information about small, tar- geted parts of a given program. This
motivates the need for demand-driven pointer analyses that compute
information only where required. Pointer analyses generally compute
points-to sets of program variables or answer boolean alias
queries. However, many client analyses require richer pointer
information. For example, taint and typestate analyses often need to
know the set of all aliases of a given variable under a certain
calling context. With most current pointer analyses, clients must
compute such information through repeated points-to or alias queries, increasing complexity and computation time for them.
This paper presents Boomerang, a demand-driven, flow-, field-, and
context-sensitive pointer analysis for Java programs. Boomerang
computes rich results that include both the possible allocation sites of a given pointer (points-to information) and all pointers that can point to those allocation sites (alias information). For increased precision and scalability, clients can query Boomerang with respect to particular calling contexts of interest.
Our experiments show that Boomerang is more precise than existing
demand-driven pointer analyses. Additionally, using Boomerang, the
taint analysis FlowDroid issues up to 29.4x fewer pointer queries
compared to using other pointer analyses that return simpler pointer
infor- mation. Furthermore, the search space of Boomerang can be
significantly reduced by requesting calling contexts from the client
analysis
Efficient and Effective Handling of Exceptions in Java Points-To Analysis
A joint points-to and exception analysis has been shown to yield benefits in both precision and performance. Treating exceptions as regular objects,
however, incurs significant and rather unexpected overhead. We show that in a
typical joint analysis most of the objects computed to flow in and out of a method
are due to exceptional control-flow and not normal call-return control-flow. For
instance, a context-insensitive analysis of the Antlr benchmark from the DaCapo
suite computes 4-5 times more objects going in or out of a method due to exceptional control-flow than due to normal control-flow. As a consequence, the
analysis spends a large amount of its time considering exceptions.
We show that the problem can be addressed both e
ectively and elegantly by
coarsening the representation of exception objects. An interesting find is that, instead of recording each distinct exception object, we can collapse all exceptions
of the same type, and use one representative object per type, to yield nearly identical precision (loss of less than 0.1%) but with a boost in performance of at least
50% for most analyses and benchmarks and large space savings (usually 40% or
more)
Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers
Flow- and context-sensitive points-to analysis is difficult to scale; for
top-down approaches, the problem centers on repeated analysis of the same
procedure; for bottom-up approaches, the abstractions used to represent
procedure summaries have not scaled while preserving precision.
We propose a novel abstraction called the Generalized Points-to Graph (GPG)
which views points-to relations as memory updates and generalizes them using
the counts of indirection levels leaving the unknown pointees implicit. This
allows us to construct GPGs as compact representations of bottom-up procedure
summaries in terms of memory updates and control flow between them. Their
compactness is ensured by the following optimizations: strength reduction
reduces the indirection levels, redundancy elimination removes redundant memory
updates and minimizes control flow (without over-approximating data dependence
between memory updates), and call inlining enhances the opportunities of these
optimizations. We devise novel operations and data flow analyses for these
optimizations.
Our quest for scalability of points-to analysis leads to the following
insight: The real killer of scalability in program analysis is not the amount
of data but the amount of control flow that it may be subjected to in search of
precision. The effectiveness of GPGs lies in the fact that they discard as much
control flow as possible without losing precision (i.e., by preserving data
dependence without over-approximation). This is the reason why the GPGs are
very small even for main procedures that contain the effect of the entire
program. This allows our implementation to scale to 158kLoC for C programs
- …