37 research outputs found
Value-Flow-Based Demand-Driven Pointer Analysis for C and C++
IEEE We present SUPA, a value-flow-based demand-driven flow- and context-sensitive pointer analysis with strong updates for C and C++ programs. SUPA enables computing points-to information via value-flow refinement, in environments with small time and memory budgets. We formulate SUPA by solving a graph-reachability problem on an inter-procedural value-flow graph representing a program's def-use chains, which are pre-computed efficiently but over-approximately. To answer a client query (a request for a variable's points-to set), SUPA reasons about the flow of values along the pre-computed def-use chains sparsely (rather than across all program points), by performing only the work necessary for the query (rather than analyzing the whole program). In particular, strong updates are performed to filter out spurious def-use chains through value-flow refinement as long as the total budget is not exhausted
Demand-Driven Pointer Analysis with Strong Updates via Value-Flow Refinement
We present a new demand-driven flow- and context-sensitive pointer analysis
with strong updates for C programs, called SUPA, that enables computing
points-to information via value-flow refinement, in environments with small
time and memory budgets such as IDEs. We formulate SUPA by solving a graph
reachability problem on an inter-procedural value-flow graph representing a
program's def-use chains, which are pre-computed efficiently but
over-approximately. To answer a client query (a request for a variable's
points-to set), SUPA reasons about the flow of values along the pre-computed
def-use chains sparsely (rather than across all program points), by performing
only the work necessary for the query (rather than analyzing the whole
program). In particular, strong updates are performed to filter out spurious
def-use chains through value-flow refinement as long as the total budget is not
exhausted. SUPA facilitates efficiency and precision tradeoffs by applying
different pointer analyses in a hybrid multi-stage analysis framework.
We have implemented SUPA in LLVM (3.5.0) and evaluate it by choosing
uninitialized pointer detection as a major client on 18 open-source C programs.
As the analysis budget increases, SUPA achieves improved precision, with its
single-stage flow-sensitive analysis reaching 97.4% of that achieved by
whole-program flow-sensitive analysis by consuming about 0.18 seconds and 65KB
of memory per query, on average (with a budget of at most 10000 value-flow
edges per query). With context-sensitivity also considered, SUPA's two- stage
analysis becomes more precise for some programs but also incurs more analysis
times. SUPA is also amenable to parallelization. A parallel implementation of
its single-stage flow-sensitive analysis achieves a speedup of up to 6.9x with
an average of 3.05x a 8-core machine with respect its sequential version
Precise Null Pointer Analysis Through Global Value Numbering
Precise analysis of pointer information plays an important role in many
static analysis techniques and tools today. The precision, however, must be
balanced against the scalability of the analysis. This paper focusses on
improving the precision of standard context and flow insensitive alias analysis
algorithms at a low scalability cost. In particular, we present a
semantics-preserving program transformation that drastically improves the
precision of existing analyses when deciding if a pointer can alias NULL. Our
program transformation is based on Global Value Numbering, a scheme inspired
from compiler optimizations literature. It allows even a flow-insensitive
analysis to make use of branch conditions such as checking if a pointer is NULL
and gain precision. We perform experiments on real-world code to measure the
overhead in performing the transformation and the improvement in the precision
of the analysis. We show that the precision improves from 86.56% to 98.05%,
while the overhead is insignificant.Comment: 17 pages, 1 section in Appendi
Symbol-Specific Sparsification of Interprocedural Distributive Environment Problems
Previous work has shown that one can often greatly speed up static analysis
by computing data flows not for every edge in the program's control-flow graph
but instead only along definition-use chains. This yields a so-called sparse
static analysis. Recent work on SparseDroid has shown that specifically taint
analysis can be "sparsified" with extraordinary effectiveness because the taint
state of one variable does not depend on those of others. This allows one to
soundly omit more flow-function computations than in the general case.
In this work, we now assess whether this result carries over to the more
generic setting of so-called Interprocedural Distributive Environment (IDE)
problems. Opposed to taint analysis, IDE comprises distributive problems with
large or even infinitely broad domains, such as typestate analysis or linear
constant propagation. Specifically, this paper presents Sparse IDE, a framework
that realizes sparsification for any static analysis that fits the IDE
framework.
We implement Sparse IDE in SparseHeros, as an extension to the popular Heros
IDE solver, and evaluate its performance on real-world Java libraries by
comparing it to the baseline IDE algorithm. To this end, we design, implement
and evaluate a linear constant propagation analysis client on top of
SparseHeros. Our experiments show that, although IDE analyses can only be
sparsified with respect to symbols and not (numeric) values, Sparse IDE can
nonetheless yield significantly lower runtimes and often also memory
consumptions compared to the original IDE.Comment: To be published in ICSE 202
Program analysis of temporal memory mismanagement
In the use of C/C++ programs, the performance benefits obtained from flexible low-level memory access and management sacrifice language-level support for memory safety and garbage collection. Memory-related programming mistakes are introduced as a result, rendering C/C++ programs prone to memory errors. A common category of programming mistakes is defined by the misplacement of deallocation operations, also known as temporal memory mismanagement, which can generate two types of bugs: (1) use-after-free (UAF) bugs and (2) memory leaks. The former are severe security vulnerabilities that expose programs to both data and control-flow exploits, while the latter are critical performance bugs that compromise software availability and reliability. In the case of UAF bugs, existing solutions that almost exclusively rely on dynamic analysis suffer from limitations, including low code coverage, binary incompatibility, and high overheads. In the case of memory leaks, detection techniques are abundant; however, fixing techniques have been poorly investigated.
In this thesis, we present three novel program analysis frameworks to address temporal memory mismanagement in C/C++. First, we introduce Tac, the first static UAF detection framework to combine typestate analysis with machine learning. Tac identifies representative features to train a Support Vector Machine to classify likely true/false UAF candidates, thereby providing guidance for typestate analysis used to locate bugs with precision.
We then present CRed, a pointer analysis-based framework for UAF detection with a novel context-reduction technique and a new demand-driven path-sensitive pointer analysis to boost scalability and precision. A major advantage of CRed is its ability to substantially and soundly reduce search space without losing bug-finding ability. This is achieved by utilizing must-not-alias information to truncate unnecessary segments of calling contexts.
Finally, we propose AutoFix, an automated memory leak fixing framework based on value-flow analysis and static instrumentation that can fix all leaks reported by any front-end detector with negligible overheads safely and with precision. AutoFix tolerates false leaks with a shadow memory data structure carefully designed to keep track of the allocation and deallocation of potentially leaked memory objects.
The contribution of this thesis is threefold. First, we advance existing state-of-the-art solutions to detecting memory leaks by proposing a series of novel program analysis techniques to address temporal memory mismanagement. Second, corresponding prototype tools are fully implemented in the LLVM compiler framework. Third, an extensive evaluation of open-source C/C++ benchmarks is conducted to validate the effectiveness of the proposed techniques
Boomerang: Demand-Driven Flow- and Context-Sensitive Pointer Analysis for Java
Many current program analyses require highly precise pointer
information about small, tar- geted parts of a given program. This
motivates the need for demand-driven pointer analyses that compute
information only where required. Pointer analyses generally compute
points-to sets of program variables or answer boolean alias
queries. However, many client analyses require richer pointer
information. For example, taint and typestate analyses often need to
know the set of all aliases of a given variable under a certain
calling context. With most current pointer analyses, clients must
compute such information through repeated points-to or alias queries, increasing complexity and computation time for them.
This paper presents Boomerang, a demand-driven, flow-, field-, and
context-sensitive pointer analysis for Java programs. Boomerang
computes rich results that include both the possible allocation sites of a given pointer (points-to information) and all pointers that can point to those allocation sites (alias information). For increased precision and scalability, clients can query Boomerang with respect to particular calling contexts of interest.
Our experiments show that Boomerang is more precise than existing
demand-driven pointer analyses. Additionally, using Boomerang, the
taint analysis FlowDroid issues up to 29.4x fewer pointer queries
compared to using other pointer analyses that return simpler pointer
infor- mation. Furthermore, the search space of Boomerang can be
significantly reduced by requesting calling contexts from the client
analysis
Logic Programming Applications: What Are the Abstractions and Implementations?
This article presents an overview of applications of logic programming,
classifying them based on the abstractions and implementations of logic
languages that support the applications. The three key abstractions are join,
recursion, and constraint. Their essential implementations are for-loops, fixed
points, and backtracking, respectively. The corresponding kinds of applications
are database queries, inductive analysis, and combinatorial search,
respectively. We also discuss language extensions and programming paradigms,
summarize example application problems by application areas, and touch on
example systems that support variants of the abstractions with different
implementations