107 research outputs found
Recommended from our members
A Unified Model for Context-Sensitive Program Analyses: The Blind Men and the Elephant
Context-sensitive methods of program analysis increase the precision
of interprocedural analysis by achieving the effect of call inlining.
These methods have been defined using different formalisms and hence
appear as algorithms that are very different from each other. Some
methods traverse a call graph top-down whereas some others traverse
it bottom-up first and then top-down. Some define contexts explicitly
whereas some do not. Some of them directly compute data flow values
while some first compute summary functions and then use them to compute
data flow values. Further, different methods place different kinds
of restrictions on the data flow frameworks supported by them. As a
consequence, it is difficult to compare the ideas behind these methods
in spite of the fact that they solve essentially the same problem. We
argue that these incomparable views are similar to those of blind men
describing an
elephant called context sensitivity, and make it difficult for a
non-expert reader to form a coherent picture of context-sensitive data
flow analysis.
We bring out this whole-elephant view of context sensitivity in
program analysis by proposing a unified model of context sensitivity
which provides a clean separation between computation of contexts and
computation of data flow values.
Our model captures the essence of context sensitivity and
defines simple soundness
and precision criteria for context-sensitive methods.
It facilitates declarative
specifications of context-sensitive methods,
insightful comparisons between them,
and reasoning about their soundness and precision.
We demonstrate this by instantiating our model to
many known context-sensitive methods
Parameterized Algorithms for Scalable Interprocedural Data-flow Analysis
Data-flow analysis is a general technique used to compute information of
interest at different points of a program and is considered to be a cornerstone
of static analysis. In this thesis, we consider interprocedural data-flow
analysis as formalized by the standard IFDS framework, which can express many
widely-used static analyses such as reaching definitions, live variables, and
null-pointer. We focus on the well-studied on-demand setting in which queries
arrive one-by-one in a stream and each query should be answered as fast as
possible. While the classical IFDS algorithm provides a polynomial-time
solution to this problem, it is not scalable in practice. Specifically, it
either requires a quadratic-time preprocessing phase or takes linear time per
query, both of which are untenable for modern huge codebases with hundreds of
thousands of lines. Previous works have already shown that parameterizing the
problem by the treewidth of the program's control-flow graph is promising and
can lead to significant gains in efficiency. Unfortunately, these results were
only applicable to the limited special case of same-context queries.
In this work, we obtain significant speedups for the general case of
on-demand IFDS with queries that are not necessarily same-context. This is
achieved by exploiting a new graph sparsity parameter, namely the treedepth of
the program's call graph. Our approach is the first to exploit the sparsity of
control-flow graphs and call graphs at the same time and parameterize by both
treewidth and treedepth. We obtain an algorithm with a linear preprocessing
phase that can answer each query in constant time with respect to the input
size. Finally, we show experimental results demonstrating that our approach
significantly outperforms the classical IFDS and its on-demand variant
Faster Algorithms for Weighted Recursive State Machines
Pushdown systems (PDSs) and recursive state machines (RSMs), which are
linearly equivalent, are standard models for interprocedural analysis. Yet RSMs
are more convenient as they (a) explicitly model function calls and returns,
and (b) specify many natural parameters for algorithmic analysis, e.g., the
number of entries and exits. We consider a general framework where RSM
transitions are labeled from a semiring and path properties are algebraic with
semiring operations, which can model, e.g., interprocedural reachability and
dataflow analysis problems.
Our main contributions are new algorithms for several fundamental problems.
As compared to a direct translation of RSMs to PDSs and the best-known existing
bounds of PDSs, our analysis algorithm improves the complexity for
finite-height semirings (that subsumes reachability and standard dataflow
properties). We further consider the problem of extracting distance values from
the representation structures computed by our algorithm, and give efficient
algorithms that distinguish the complexity of a one-time preprocessing from the
complexity of each individual query. Another advantage of our algorithm is that
our improvements carry over to the concurrent setting, where we improve the
best-known complexity for the context-bounded analysis of concurrent RSMs.
Finally, we provide a prototype implementation that gives a significant
speed-up on several benchmarks from the SLAM/SDV project
Symbol-Specific Sparsification of Interprocedural Distributive Environment Problems
Previous work has shown that one can often greatly speed up static analysis
by computing data flows not for every edge in the program's control-flow graph
but instead only along definition-use chains. This yields a so-called sparse
static analysis. Recent work on SparseDroid has shown that specifically taint
analysis can be "sparsified" with extraordinary effectiveness because the taint
state of one variable does not depend on those of others. This allows one to
soundly omit more flow-function computations than in the general case.
In this work, we now assess whether this result carries over to the more
generic setting of so-called Interprocedural Distributive Environment (IDE)
problems. Opposed to taint analysis, IDE comprises distributive problems with
large or even infinitely broad domains, such as typestate analysis or linear
constant propagation. Specifically, this paper presents Sparse IDE, a framework
that realizes sparsification for any static analysis that fits the IDE
framework.
We implement Sparse IDE in SparseHeros, as an extension to the popular Heros
IDE solver, and evaluate its performance on real-world Java libraries by
comparing it to the baseline IDE algorithm. To this end, we design, implement
and evaluate a linear constant propagation analysis client on top of
SparseHeros. Our experiments show that, although IDE analyses can only be
sparsified with respect to symbols and not (numeric) values, Sparse IDE can
nonetheless yield significantly lower runtimes and often also memory
consumptions compared to the original IDE.Comment: To be published in ICSE 202
Verification of temporal properties involving multiple interacting objects
Defects that arise due to violating a prescribed order for executing statements or executing a disallowed sequence of statements can be hard to detect since the sequence is often spread over multiple functions and source code files. In this dissertation, we develop a verification tool which uses a sound and precise static analysis to verify temporal specifications that can involve multiple objects.
Statically analyzing properties that involve multiple objects requires two separate abstractions; one that abstracts the objects in the program and the second which abstracts the state of a group of objects. We present two such abstractions. Objects are abstracted using a storeless heap abstraction. This provides flow-sensitive tracking of individual objects along control flow paths and precise may-alias information. The state abstraction leverages the object abstraction to abstract the state of a group of related objects.
We use the IFDS algorithm to implement an analysis that computes the object and state abstractions. Since the original IFDS algorithm is not directly suitable for domains involving objects and pointers, we present four extensions to the original IFDS algorithm. We also present results of an empirical study to measure the precision of the analysis.
The performance of the analysis is improved through the use of two types of method summaries. Callee summaries guarantee that using the summary instead of flow-sensitive analysis of the callee does not degrade the precision of the abstraction at the callsite for the callee. For further performance gains, caller summaries that make conservative assumptions for aliasing between parameters of a function call are used. We present results from empirically evaluating the use of these summaries for the object analysis.
Finally, to make the analysis practical for use in the development life cycle, we present a verification tool to
configure the analysis and visualize the results. The tool provides a number of configuration options to run the analysis.
The analysis results are presented in a list displaying statements flagged as possible violations of a property and, for each violation, the sequence of events (statements) that lead to this violation
Boomerang: Demand-Driven Flow- and Context-Sensitive Pointer Analysis for Java
Many current program analyses require highly precise pointer
information about small, tar- geted parts of a given program. This
motivates the need for demand-driven pointer analyses that compute
information only where required. Pointer analyses generally compute
points-to sets of program variables or answer boolean alias
queries. However, many client analyses require richer pointer
information. For example, taint and typestate analyses often need to
know the set of all aliases of a given variable under a certain
calling context. With most current pointer analyses, clients must
compute such information through repeated points-to or alias queries, increasing complexity and computation time for them.
This paper presents Boomerang, a demand-driven, flow-, field-, and
context-sensitive pointer analysis for Java programs. Boomerang
computes rich results that include both the possible allocation sites of a given pointer (points-to information) and all pointers that can point to those allocation sites (alias information). For increased precision and scalability, clients can query Boomerang with respect to particular calling contexts of interest.
Our experiments show that Boomerang is more precise than existing
demand-driven pointer analyses. Additionally, using Boomerang, the
taint analysis FlowDroid issues up to 29.4x fewer pointer queries
compared to using other pointer analyses that return simpler pointer
infor- mation. Furthermore, the search space of Boomerang can be
significantly reduced by requesting calling contexts from the client
analysis
Lossless, Persisted Summarization of Static Callgraph, Points-To and Data-Flow Analysis
Static analysis is used to automatically detect bugs and security breaches, and aids compiler optimization. Whole-program analysis (WPA) can yield high precision, however causes long analysis times and thus does not match common software-development workflows, making it often impractical to use for large, real-world applications.
This paper thus presents the design and implementation of ModAlyzer, a novel static-analysis approach that aims at accelerating whole-program analysis by making the analysis modular and compositional. It shows how to compute lossless, persisted summaries for callgraph, points-to and data-flow information, and it reports under which circumstances this function-level compositional analysis outperforms WPA.
We implemented ModAlyzer as an extension to LLVM and PhASAR, and applied it to 12 real-world C and C++ applications. At analysis time, ModAlyzer modularly and losslessly summarizes the analysis effect of the library code those applications share, hence avoiding its repeated re-analysis. The experimental results show that the reuse of these summaries can save, on average, 72% of analysis time over WPA. Moreover, because it is lossless, the module-wise analysis fully retains precision and recall. Surprisingly, as our results show, it sometimes even yields precision superior to WPA. The initial summary generation, on average, takes about 3.67 times as long as WPA
- …