238 research outputs found
Efficient Computation of Parameterized Pointer Information for Interprocedural Analyses
Pointer information that is provided by many algorithms
identifies a memory location using the same name throughout
a program. Such pointer information is inappropriate
for use in analyzing C programs because, using such information,
a program analysis may propagate a large amount of
spurious information across procedure boundaries. This paper
presents a modular algorithm that efficiently computes
parameterized pointer information in which symbolic names
are introduced to identify memory locations whose addresses
may be passed into a procedure. Because a symbolic name
may identify different memory locations when the procedure
is invoked under different callsites, using parameterized
pointer information can help a program analysis reduce the
spurious information that is propagated across procedure
boundaries. The paper also presents a set of empirical studies,
that demonstrate (a) the efficiency of the algorithm, and
(b) the benefits of using parameterized pointer information
over using non-parameterized pointer information in program
analyses. The studies show that using parameterized
pointer information may significantly improve the precision
and the efficiency of many program analyses
Structural Analysis: Shape Information via Points-To Computation
This paper introduces a new hybrid memory analysis, Structural Analysis,
which combines an expressive shape analysis style abstract domain with
efficient and simple points-to style transfer functions. Using data from
empirical studies on the runtime heap structures and the programmatic idioms
used in modern object-oriented languages we construct a heap analysis with the
following characteristics: (1) it can express a rich set of structural, shape,
and sharing properties which are not provided by a classic points-to analysis
and that are useful for optimization and error detection applications (2) it
uses efficient, weakly-updating, set-based transfer functions which enable the
analysis to be more robust and scalable than a shape analysis and (3) it can be
used as the basis for a scalable interprocedural analysis that produces precise
results in practice.
The analysis has been implemented for .Net bytecode and using this
implementation we evaluate both the runtime cost and the precision of the
results on a number of well known benchmarks and real world programs. Our
experimental evaluations show that the domain defined in this paper is capable
of precisely expressing the majority of the connectivity, shape, and sharing
properties that occur in practice and, despite the use of weak updates, the
static analysis is able to precisely approximate the ideal results. The
analysis is capable of analyzing large real-world programs (over 30K bytecodes)
in less than 65 seconds and using less than 130MB of memory. In summary this
work presents a new type of memory analysis that advances the state of the art
with respect to expressive power, precision, and scalability and represents a
new area of study on the relationships between and combination of concepts from
shape and points-to analyses
Parameterized Algorithms for Scalable Interprocedural Data-flow Analysis
Data-flow analysis is a general technique used to compute information of
interest at different points of a program and is considered to be a cornerstone
of static analysis. In this thesis, we consider interprocedural data-flow
analysis as formalized by the standard IFDS framework, which can express many
widely-used static analyses such as reaching definitions, live variables, and
null-pointer. We focus on the well-studied on-demand setting in which queries
arrive one-by-one in a stream and each query should be answered as fast as
possible. While the classical IFDS algorithm provides a polynomial-time
solution to this problem, it is not scalable in practice. Specifically, it
either requires a quadratic-time preprocessing phase or takes linear time per
query, both of which are untenable for modern huge codebases with hundreds of
thousands of lines. Previous works have already shown that parameterizing the
problem by the treewidth of the program's control-flow graph is promising and
can lead to significant gains in efficiency. Unfortunately, these results were
only applicable to the limited special case of same-context queries.
In this work, we obtain significant speedups for the general case of
on-demand IFDS with queries that are not necessarily same-context. This is
achieved by exploiting a new graph sparsity parameter, namely the treedepth of
the program's call graph. Our approach is the first to exploit the sparsity of
control-flow graphs and call graphs at the same time and parameterize by both
treewidth and treedepth. We obtain an algorithm with a linear preprocessing
phase that can answer each query in constant time with respect to the input
size. Finally, we show experimental results demonstrating that our approach
significantly outperforms the classical IFDS and its on-demand variant
Automatically Finding Bugs in Open Source Programs
We consider properties desirable for static analysis tools targeted at finding bugs in the real open source code, and review tools based on various approaches to defect detection. A static analysis tool is described, that includes a framework for flow-sensitive interprocedural dataflow analysis and scales to analysis of large
programs. The framework enables implementation of multiple checkers searching for specific bugs, such as null pointer dereference and buffer overflow, abstracting from the checkers details such as alias analysis
Expression-based aliasing for OO-languages
Alias analysis has been an interesting research topic in verification and
optimization of programs. The undecidability of determining whether two
expressions in a program may reference to the same object is the main source of
the challenges raised in alias analysis. In this paper we propose an extension
of a previously introduced alias calculus based on program expressions, to the
setting of unbounded program executions s.a. infinite loops and recursive
calls. Moreover, we devise a corresponding executable specification in the
K-framework. An important property of our extension is that, in a
non-concurrent setting, the corresponding alias expressions can be
over-approximated in terms of a notion of regular expressions. This further
enables us to show that the associated K-machinery implements an algorithm that
always stops and provides a sound over-approximation of the "may aliasing"
information, where soundness stands for the lack of false negatives. As a case
study, we analyze the integration and further applications of the alias
calculus in SCOOP. The latter is an object-oriented programming model for
concurrency, recently formalized in Maude; K-definitions can be compiled into
Maude for execution
Heap Abstractions for Static Analysis
Heap data is potentially unbounded and seemingly arbitrary. As a consequence,
unlike stack and static memory, heap memory cannot be abstracted directly in
terms of a fixed set of source variable names appearing in the program being
analysed. This makes it an interesting topic of study and there is an abundance
of literature employing heap abstractions. Although most studies have addressed
similar concerns, their formulations and formalisms often seem dissimilar and
some times even unrelated. Thus, the insights gained in one description of heap
abstraction may not directly carry over to some other description. This survey
is a result of our quest for a unifying theme in the existing descriptions of
heap abstractions. In particular, our interest lies in the abstractions and not
in the algorithms that construct them.
In our search of a unified theme, we view a heap abstraction as consisting of
two features: a heap model to represent the heap memory and a summarization
technique for bounding the heap representation. We classify the models as
storeless, store based, and hybrid. We describe various summarization
techniques based on k-limiting, allocation sites, patterns, variables, other
generic instrumentation predicates, and higher-order logics. This approach
allows us to compare the insights of a large number of seemingly dissimilar
heap abstractions and also paves way for creating new abstractions by
mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure
Flow- and context-sensitive points-to analysis using generalized points-to graphs
© Springer-Verlag GmbH Germany 2016. Bottom-up interprocedural methods of program analysis construct summary flow functions for procedures to capture the effect of their calls and have been used effectively for many analyses. However, these methods seem computationally expensive for flow- and context- sensitive points-to analysis (FCPA) which requires modelling unknown locations accessed indirectly through pointers. Such accesses are com- monly handled by using placeholders to explicate unknown locations or by using multiple call-specific summary flow functions. We generalize the concept of points-to relations by using the counts of indirection levels leaving the unknown locations implicit. This allows us to create sum- mary flow functions in the form of generalized points-to graphs (GPGs) without the need of placeholders. By design, GPGs represent both mem- ory (in terms of classical points-to facts) and memory transformers (in terms of generalized points-to facts). We perform FCPA by progressively reducing generalized points-to facts to classical points-to facts. GPGs distinguish between may and must pointer updates thereby facilitating strong updates within calling contexts. The size of GPGs is linearly bounded by the number of variables and is independent of the number of statements. Empirical measurements on SPEC benchmarks show that GPGs are indeed compact in spite of large procedure sizes. This allows us to scale FCPA to 158 kLoC using GPGs (compared to 35 kLoC reported by liveness-based FCPA). Thus GPGs hold a promise of efficiency and scalability for FCPA without compro- mising precision
- …