118,172 research outputs found
Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers
Flow- and context-sensitive points-to analysis is difficult to scale; for
top-down approaches, the problem centers on repeated analysis of the same
procedure; for bottom-up approaches, the abstractions used to represent
procedure summaries have not scaled while preserving precision.
We propose a novel abstraction called the Generalized Points-to Graph (GPG)
which views points-to relations as memory updates and generalizes them using
the counts of indirection levels leaving the unknown pointees implicit. This
allows us to construct GPGs as compact representations of bottom-up procedure
summaries in terms of memory updates and control flow between them. Their
compactness is ensured by the following optimizations: strength reduction
reduces the indirection levels, redundancy elimination removes redundant memory
updates and minimizes control flow (without over-approximating data dependence
between memory updates), and call inlining enhances the opportunities of these
optimizations. We devise novel operations and data flow analyses for these
optimizations.
Our quest for scalability of points-to analysis leads to the following
insight: The real killer of scalability in program analysis is not the amount
of data but the amount of control flow that it may be subjected to in search of
precision. The effectiveness of GPGs lies in the fact that they discard as much
control flow as possible without losing precision (i.e., by preserving data
dependence without over-approximation). This is the reason why the GPGs are
very small even for main procedures that contain the effect of the entire
program. This allows our implementation to scale to 158kLoC for C programs
BCFA: Bespoke Control Flow Analysis for CFA at Scale
Many data-driven software engineering tasks such as discovering programming
patterns, mining API specifications, etc., perform source code analysis over
control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be
expensive and performance of the analysis heavily depends on the underlying CFG
traversal strategy. State-of-the-art analysis frameworks use a fixed traversal
strategy. We argue that a single traversal strategy does not fit all kinds of
analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a
control flow analysis (CFA) and a large number of CFGs, BCFA selects the most
efficient traversal strategy for each CFG. BCFA extracts a set of properties of
the CFA by analyzing the code of the CFA and combines it with properties of the
CFG, such as branching factor and cyclicity, for selecting the optimal
traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a
set of representative static analyses that mainly involve traversing CFGs and
two large datasets containing 287 thousand and 162 million CFGs. Our results
show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA
has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.Comment: 12 page
Inferring Types to Eliminate Ownership Checks in an Intentional JavaScript Compiler
Concurrent programs are notoriously difficult to develop due to the non-deterministic nature of thread scheduling. It is desirable to have a programming language to make such development easier. Tscript comprises such a system. Tscript is an extension of JavaScript that provides multithreading support along with intent specification. These intents allow a programmer to specify how parts of the program interact in a multithreaded context. However, enforcing intents requires run-time memory checks which can be inefficient. This thesis implements an optimization in the Tscript compiler that seeks to improve this inefficiency through static analysis. Our approach utilizes both type inference and dataflow analysis to eliminate unnecessary run-time checks
Variability Abstractions: Trading Precision for Speed in Family-Based Analyses (Extended Version)
Family-based (lifted) data-flow analysis for Software Product Lines (SPLs) is
capable of analyzing all valid products (variants) without generating any of
them explicitly. It takes as input only the common code base, which encodes all
variants of a SPL, and produces analysis results corresponding to all variants.
However, the computational cost of the lifted analysis still depends inherently
on the number of variants (which is exponential in the number of features, in
the worst case). For a large number of features, the lifted analysis may be too
costly or even infeasible. In this paper, we introduce variability abstractions
defined as Galois connections and use abstract interpretation as a formal
method for the calculational-based derivation of approximate (abstracted)
lifted analyses of SPL programs, which are sound by construction. Moreover,
given an abstraction we define a syntactic transformation that translates any
SPL program into an abstracted version of it, such that the analysis of the
abstracted SPL coincides with the corresponding abstracted analysis of the
original SPL. We implement the transformation in a tool, reconfigurator that
works on Object-Oriented Java program families, and evaluate the practicality
of this approach on three Java SPL benchmarks.Comment: 50 pages, 10 figure
A Simple and Scalable Static Analysis for Bound Analysis and Amortized Complexity Analysis
We present the first scalable bound analysis that achieves amortized
complexity analysis. In contrast to earlier work, our bound analysis is not
based on general purpose reasoners such as abstract interpreters, software
model checkers or computer algebra tools. Rather, we derive bounds directly
from abstract program models, which we obtain from programs by comparatively
simple invariant generation and symbolic execution techniques. As a result, we
obtain an analysis that is more predictable and more scalable than earlier
approaches. Our experiments demonstrate that our analysis is fast and at the
same time able to compute bounds for challenging loops in a large real-world
benchmark. Technically, our approach is based on lossy vector addition systems
(VASS). Our bound analysis first computes a lexicographic ranking function that
proves the termination of a VASS, and then derives a bound from this ranking
function. Our methodology achieves amortized analysis based on a new insight
how lexicographic ranking functions can be used for bound analysis
Scaling Bounded Model Checking By Transforming Programs With Arrays
Bounded Model Checking is one the most successful techniques for finding bugs
in program. However, model checkers are resource hungry and are often unable to
verify programs with loops iterating over large arrays.We present a
transformation that enables bounded model checkers to verify a certain class of
array properties. Our technique transforms an array-manipulating (ANSI-C)
program to an array-free and loop-free (ANSI-C) program thereby reducing the
resource requirements of a model checker significantly. Model checking of the
transformed program using an off-the-shelf bounded model checker simulates the
loop iterations efficiently. Thus, our transformed program is a sound
abstraction of the original program and is also precise in a large number of
cases - we formally characterize the class of programs for which it is
guaranteed to be precise. We demonstrate the applicability and usefulness of
our technique on both industry code as well as academic benchmarks
- …