3,286 research outputs found
Heap Abstractions for Static Analysis
Heap data is potentially unbounded and seemingly arbitrary. As a consequence,
unlike stack and static memory, heap memory cannot be abstracted directly in
terms of a fixed set of source variable names appearing in the program being
analysed. This makes it an interesting topic of study and there is an abundance
of literature employing heap abstractions. Although most studies have addressed
similar concerns, their formulations and formalisms often seem dissimilar and
some times even unrelated. Thus, the insights gained in one description of heap
abstraction may not directly carry over to some other description. This survey
is a result of our quest for a unifying theme in the existing descriptions of
heap abstractions. In particular, our interest lies in the abstractions and not
in the algorithms that construct them.
In our search of a unified theme, we view a heap abstraction as consisting of
two features: a heap model to represent the heap memory and a summarization
technique for bounding the heap representation. We classify the models as
storeless, store based, and hybrid. We describe various summarization
techniques based on k-limiting, allocation sites, patterns, variables, other
generic instrumentation predicates, and higher-order logics. This approach
allows us to compare the insights of a large number of seemingly dissimilar
heap abstractions and also paves way for creating new abstractions by
mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure
Heap Reference Analysis Using Access Graphs
Despite significant progress in the theory and practice of program analysis,
analysing properties of heap data has not reached the same level of maturity as
the analysis of static and stack data. The spatial and temporal structure of
stack and static data is well understood while that of heap data seems
arbitrary and is unbounded. We devise bounded representations which summarize
properties of the heap data. This summarization is based on the structure of
the program which manipulates the heap. The resulting summary representations
are certain kinds of graphs called access graphs. The boundedness of these
representations and the monotonicity of the operations to manipulate them make
it possible to compute them through data flow analysis.
An important application which benefits from heap reference analysis is
garbage collection, where currently liveness is conservatively approximated by
reachability from program variables. As a consequence, current garbage
collectors leave a lot of garbage uncollected, a fact which has been confirmed
by several empirical studies. We propose the first ever end-to-end static
analysis to distinguish live objects from reachable objects. We use this
information to make dead objects unreachable by modifying the program. This
application is interesting because it requires discovering data flow
information representing complex semantics. In particular, we discover four
properties of heap data: liveness, aliasing, availability, and anticipability.
Together, they cover all combinations of directions of analysis (i.e. forward
and backward) and confluence of information (i.e. union and intersection). Our
analysis can also be used for plugging memory leaks in C/C++ languages.Comment: Accepted for printing by ACM TOPLAS. This version incorporates
referees' comment
Efficient and Effective Handling of Exceptions in Java Points-To Analysis
A joint points-to and exception analysis has been shown to yield benefits in both precision and performance. Treating exceptions as regular objects,
however, incurs significant and rather unexpected overhead. We show that in a
typical joint analysis most of the objects computed to flow in and out of a method
are due to exceptional control-flow and not normal call-return control-flow. For
instance, a context-insensitive analysis of the Antlr benchmark from the DaCapo
suite computes 4-5 times more objects going in or out of a method due to exceptional control-flow than due to normal control-flow. As a consequence, the
analysis spends a large amount of its time considering exceptions.
We show that the problem can be addressed both e
ectively and elegantly by
coarsening the representation of exception objects. An interesting find is that, instead of recording each distinct exception object, we can collapse all exceptions
of the same type, and use one representative object per type, to yield nearly identical precision (loss of less than 0.1%) but with a boost in performance of at least
50% for most analyses and benchmarks and large space savings (usually 40% or
more)
Active Learning of Points-To Specifications
When analyzing programs, large libraries pose significant challenges to
static points-to analysis. A popular solution is to have a human analyst
provide points-to specifications that summarize relevant behaviors of library
code, which can substantially improve precision and handle missing code such as
native code. We propose ATLAS, a tool that automatically infers points-to
specifications. ATLAS synthesizes unit tests that exercise the library code,
and then infers points-to specifications based on observations from these
executions. ATLAS automatically infers specifications for the Java standard
library, and produces better results for a client static information flow
analysis on a benchmark of 46 Android apps compared to using existing
handwritten specifications
Structural Analysis: Shape Information via Points-To Computation
This paper introduces a new hybrid memory analysis, Structural Analysis,
which combines an expressive shape analysis style abstract domain with
efficient and simple points-to style transfer functions. Using data from
empirical studies on the runtime heap structures and the programmatic idioms
used in modern object-oriented languages we construct a heap analysis with the
following characteristics: (1) it can express a rich set of structural, shape,
and sharing properties which are not provided by a classic points-to analysis
and that are useful for optimization and error detection applications (2) it
uses efficient, weakly-updating, set-based transfer functions which enable the
analysis to be more robust and scalable than a shape analysis and (3) it can be
used as the basis for a scalable interprocedural analysis that produces precise
results in practice.
The analysis has been implemented for .Net bytecode and using this
implementation we evaluate both the runtime cost and the precision of the
results on a number of well known benchmarks and real world programs. Our
experimental evaluations show that the domain defined in this paper is capable
of precisely expressing the majority of the connectivity, shape, and sharing
properties that occur in practice and, despite the use of weak updates, the
static analysis is able to precisely approximate the ideal results. The
analysis is capable of analyzing large real-world programs (over 30K bytecodes)
in less than 65 seconds and using less than 130MB of memory. In summary this
work presents a new type of memory analysis that advances the state of the art
with respect to expressive power, precision, and scalability and represents a
new area of study on the relationships between and combination of concepts from
shape and points-to analyses
- …