24,678 research outputs found
Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers
Flow- and context-sensitive points-to analysis is difficult to scale; for
top-down approaches, the problem centers on repeated analysis of the same
procedure; for bottom-up approaches, the abstractions used to represent
procedure summaries have not scaled while preserving precision.
We propose a novel abstraction called the Generalized Points-to Graph (GPG)
which views points-to relations as memory updates and generalizes them using
the counts of indirection levels leaving the unknown pointees implicit. This
allows us to construct GPGs as compact representations of bottom-up procedure
summaries in terms of memory updates and control flow between them. Their
compactness is ensured by the following optimizations: strength reduction
reduces the indirection levels, redundancy elimination removes redundant memory
updates and minimizes control flow (without over-approximating data dependence
between memory updates), and call inlining enhances the opportunities of these
optimizations. We devise novel operations and data flow analyses for these
optimizations.
Our quest for scalability of points-to analysis leads to the following
insight: The real killer of scalability in program analysis is not the amount
of data but the amount of control flow that it may be subjected to in search of
precision. The effectiveness of GPGs lies in the fact that they discard as much
control flow as possible without losing precision (i.e., by preserving data
dependence without over-approximation). This is the reason why the GPGs are
very small even for main procedures that contain the effect of the entire
program. This allows our implementation to scale to 158kLoC for C programs
Parameterized Construction of Program Representations for Sparse Dataflow Analyses
Data-flow analyses usually associate information with control flow regions.
Informally, if these regions are too small, like a point between two
consecutive statements, we call the analysis dense. On the other hand, if these
regions include many such points, then we call it sparse. This paper presents a
systematic method to build program representations that support sparse
analyses. To pave the way to this framework we clarify the bibliography about
well-known intermediate program representations. We show that our approach, up
to parameter choice, subsumes many of these representations, such as the SSA,
SSI and e-SSA forms. In particular, our algorithms are faster, simpler and more
frugal than the previous techniques used to construct SSI - Static Single
Information - form programs. We produce intermediate representations isomorphic
to Choi et al.'s Sparse Evaluation Graphs (SEG) for the family of data-flow
problems that can be partitioned per variables. However, contrary to SEGs, we
can handle - sparsely - problems that are not in this family
A Simple and Scalable Static Analysis for Bound Analysis and Amortized Complexity Analysis
We present the first scalable bound analysis that achieves amortized
complexity analysis. In contrast to earlier work, our bound analysis is not
based on general purpose reasoners such as abstract interpreters, software
model checkers or computer algebra tools. Rather, we derive bounds directly
from abstract program models, which we obtain from programs by comparatively
simple invariant generation and symbolic execution techniques. As a result, we
obtain an analysis that is more predictable and more scalable than earlier
approaches. Our experiments demonstrate that our analysis is fast and at the
same time able to compute bounds for challenging loops in a large real-world
benchmark. Technically, our approach is based on lossy vector addition systems
(VASS). Our bound analysis first computes a lexicographic ranking function that
proves the termination of a VASS, and then derives a bound from this ranking
function. Our methodology achieves amortized analysis based on a new insight
how lexicographic ranking functions can be used for bound analysis
Variability Abstractions: Trading Precision for Speed in Family-Based Analyses (Extended Version)
Family-based (lifted) data-flow analysis for Software Product Lines (SPLs) is
capable of analyzing all valid products (variants) without generating any of
them explicitly. It takes as input only the common code base, which encodes all
variants of a SPL, and produces analysis results corresponding to all variants.
However, the computational cost of the lifted analysis still depends inherently
on the number of variants (which is exponential in the number of features, in
the worst case). For a large number of features, the lifted analysis may be too
costly or even infeasible. In this paper, we introduce variability abstractions
defined as Galois connections and use abstract interpretation as a formal
method for the calculational-based derivation of approximate (abstracted)
lifted analyses of SPL programs, which are sound by construction. Moreover,
given an abstraction we define a syntactic transformation that translates any
SPL program into an abstracted version of it, such that the analysis of the
abstracted SPL coincides with the corresponding abstracted analysis of the
original SPL. We implement the transformation in a tool, reconfigurator that
works on Object-Oriented Java program families, and evaluate the practicality
of this approach on three Java SPL benchmarks.Comment: 50 pages, 10 figure
Dynamic data flow testing
Data flow testing is a particular form of testing that identifies data flow relations as test objectives. Data flow testing has recently attracted new interest in the context of testing object oriented systems, since data flow information is well suited to capture relations among the object states, and can thus provide useful information for testing method interactions. Unfortunately, classic data flow testing, which is based on static analysis of the source code, fails to identify many important data flow relations due to the dynamic nature of object oriented systems. This thesis presents Dynamic Data Flow Testing, a technique which rethinks data flow testing to suit the testing of modern object oriented software. Dynamic Data Flow Testing stems from empirical evidence that we collect on the limits of classic data flow testing techniques. We investigate such limits by means of Dynamic Data Flow Analysis, a dynamic implementation of data flow analysis that computes sound data flow information on program traces. We compare data flow information collected with static analysis of the code with information observed dynamically on execution traces, and empirically observe that the data flow information computed with classic analysis of the source code misses a significant part of information that corresponds to relevant behaviors that shall be tested. In view of these results, we propose Dynamic Data Flow Testing. The technique promotes the synergies between dynamic analysis, static reasoning and test case generation for automatically extending a test suite with test cases that execute the complex state based interactions between objects. Dynamic Data Flow Testing computes precise data flow information of the program with Dynamic Data Flow Analysis, processes the dynamic information to infer new test objectives, which Dynamic Data Flow Testing uses to generate new test cases. The test cases generated by Dynamic Data Flow Testing exercise relevant behaviors that are otherwise missed by both the original test suite and test suites that satisfy classic data flow criteria
Heap Reference Analysis Using Access Graphs
Despite significant progress in the theory and practice of program analysis,
analysing properties of heap data has not reached the same level of maturity as
the analysis of static and stack data. The spatial and temporal structure of
stack and static data is well understood while that of heap data seems
arbitrary and is unbounded. We devise bounded representations which summarize
properties of the heap data. This summarization is based on the structure of
the program which manipulates the heap. The resulting summary representations
are certain kinds of graphs called access graphs. The boundedness of these
representations and the monotonicity of the operations to manipulate them make
it possible to compute them through data flow analysis.
An important application which benefits from heap reference analysis is
garbage collection, where currently liveness is conservatively approximated by
reachability from program variables. As a consequence, current garbage
collectors leave a lot of garbage uncollected, a fact which has been confirmed
by several empirical studies. We propose the first ever end-to-end static
analysis to distinguish live objects from reachable objects. We use this
information to make dead objects unreachable by modifying the program. This
application is interesting because it requires discovering data flow
information representing complex semantics. In particular, we discover four
properties of heap data: liveness, aliasing, availability, and anticipability.
Together, they cover all combinations of directions of analysis (i.e. forward
and backward) and confluence of information (i.e. union and intersection). Our
analysis can also be used for plugging memory leaks in C/C++ languages.Comment: Accepted for printing by ACM TOPLAS. This version incorporates
referees' comment
- …