497 research outputs found
Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers
Flow- and context-sensitive points-to analysis is difficult to scale; for
top-down approaches, the problem centers on repeated analysis of the same
procedure; for bottom-up approaches, the abstractions used to represent
procedure summaries have not scaled while preserving precision.
We propose a novel abstraction called the Generalized Points-to Graph (GPG)
which views points-to relations as memory updates and generalizes them using
the counts of indirection levels leaving the unknown pointees implicit. This
allows us to construct GPGs as compact representations of bottom-up procedure
summaries in terms of memory updates and control flow between them. Their
compactness is ensured by the following optimizations: strength reduction
reduces the indirection levels, redundancy elimination removes redundant memory
updates and minimizes control flow (without over-approximating data dependence
between memory updates), and call inlining enhances the opportunities of these
optimizations. We devise novel operations and data flow analyses for these
optimizations.
Our quest for scalability of points-to analysis leads to the following
insight: The real killer of scalability in program analysis is not the amount
of data but the amount of control flow that it may be subjected to in search of
precision. The effectiveness of GPGs lies in the fact that they discard as much
control flow as possible without losing precision (i.e., by preserving data
dependence without over-approximation). This is the reason why the GPGs are
very small even for main procedures that contain the effect of the entire
program. This allows our implementation to scale to 158kLoC for C programs
BCFA: Bespoke Control Flow Analysis for CFA at Scale
Many data-driven software engineering tasks such as discovering programming
patterns, mining API specifications, etc., perform source code analysis over
control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be
expensive and performance of the analysis heavily depends on the underlying CFG
traversal strategy. State-of-the-art analysis frameworks use a fixed traversal
strategy. We argue that a single traversal strategy does not fit all kinds of
analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a
control flow analysis (CFA) and a large number of CFGs, BCFA selects the most
efficient traversal strategy for each CFG. BCFA extracts a set of properties of
the CFA by analyzing the code of the CFA and combines it with properties of the
CFG, such as branching factor and cyclicity, for selecting the optimal
traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a
set of representative static analyses that mainly involve traversing CFGs and
two large datasets containing 287 thousand and 162 million CFGs. Our results
show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA
has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.Comment: 12 page
Value-Flow-Based Demand-Driven Pointer Analysis for C and C++
IEEE We present SUPA, a value-flow-based demand-driven flow- and context-sensitive pointer analysis with strong updates for C and C++ programs. SUPA enables computing points-to information via value-flow refinement, in environments with small time and memory budgets. We formulate SUPA by solving a graph-reachability problem on an inter-procedural value-flow graph representing a program's def-use chains, which are pre-computed efficiently but over-approximately. To answer a client query (a request for a variable's points-to set), SUPA reasons about the flow of values along the pre-computed def-use chains sparsely (rather than across all program points), by performing only the work necessary for the query (rather than analyzing the whole program). In particular, strong updates are performed to filter out spurious def-use chains through value-flow refinement as long as the total budget is not exhausted
Flow-sensitive type-based heap cloning
By respecting program control-flow, flow-sensitive pointer analysis promises more precise results than its flow-insensitive counterpart. However, existing heap abstractions for C and C++ flow-sensitive pointer analyses model the heap by creating a single abstract heap object for each memory allocation. Two runtime heap objects which originate from the same allocation site are imprecisely modelled using one abstract object, which makes them share the same imprecise points-to sets and thus reduces the benefit of analysing heap objects flow-sensitively. On the other hand, equipping flow-sensitive analysis with context-sensitivity, whereby an abstract heap object would be created (cloned) per calling context, can yield a more precise heap model, but at the cost of uncontrollable analysis overhead when analysing larger programs. This paper presents TypeClone, a new type-based heap model for flow-sensitive analysis. Our key insight is to differentiate concrete heap objects lazily using type information at use sites within the program control-flow (e.g., when accessed via pointer dereferencing) for programs which conform to the strict aliasing rules set out by the C and C++ standards. The novelty of TypeClone lies in its lazy heap cloning: an untyped abstract heap object created at an allocation site is killed and replaced with a new object (i.e. a clone), uniquely identified by the type information at its use site, for flow-sensitive points-to propagation. Thus, heap cloning can be performed within a flow-sensitive analysis without the need for context-sensitivity. Moreover, TypeClone supports new kinds of strong updates for flow-sensitive analysis where heap objects are filtered out from imprecise points-to relations at object use sites according to the strict aliasing rules. Our method is neither strictly superior nor inferior to context-sensitive heap cloning, but rather, represents a new dimension that achieves a sweet spot between precision and efficiency. We evaluate our analysis by comparing TypeClone with state-of-the-art sparse flow-sensitive points-to analysis using the 12 largest programs in GNU Coreutils. Our experimental results also confirm that TypeClone is more precise than flow-sensitive pointer analysis and is able to, on average, answer over 15% more alias queries with a no-alias result
Efficient Flow-Sensitive Pointer Analysis on Full-Sparse Memory SSA
Pointer analysis is a fundamental research topic in computer science. It statically determines the potential runtime targets of pointers. Many clients benefit from this information, including compiler optimization, bug detection, security analysis and change impact analysis, etc. As a key dimension in pointer analysis, flow-sensitivity improves its precision by considering program execution order. Ideally, flow-sensitive pointer analysis should be performed by analyzing each program path independently. However, even ignoring the branch conditions, such solution remains intractable and extremely expensive for whole program analysis due to potentially unbounded program paths. Finding the right balance between precision and efficiency in flow-sensitive pointer analysis lies at the heart of pointer analysis.
In this thesis, we first introduce an efficient inter-procedural full-sparse memory SSA construction algorithm. Then we improve flow-sensitive pointer analysis based on the memory SSA with the following two contributions:
First, Selfs, a region-based selective flow-sensitive pointer analysis, is proposed to allow precision and efficiency trade-offs to be made according to region partitioning. By maintaining flow-sensitivity between regions instead of statements, Selfs is faster than the state-of-the-art full-sparse flow-sensitive analysis while achieving the same precision when used for alias queries. Second, Spas explores the intra-procedural path correlations on top of sparse flow-sensitive and context-sensitive pointer analysis. By using binary decision diagrams to represent the compact path conditions, Spas improves the precision of pointer analysis while only introducing a small overhead
Recommended from our members
Summary-Based Pointer Analysis Framework for Modular Bug Finding
Modern society is irreversibly dependent on computers and, consequently, on software. However, as the complexity of programs increase, so does the number of defects within them. To alleviate the problem, automated techniques are constantly used to improve software quality. Static analysis is one such approach in which violations of correctness properties are searched and reported. Static analysis has many advantages, but it is necessarily conservative because it symbolically executes the program instead of using real inputs, and it considers all possible executions simultaneously. Being conservative often means issuing false alarms, or missing real program errors. Pointer variables are a challenging aspect of many languages that can force static analysis tools to be overly conservative. It is often unclear what variables are affected by pointer-manipulating expressions, and aliasing between variables is one of the banes of program analysis. To alleviate that, a common solution is to allow the programmer to provide annotations such as declaring a variable as unaliased in a given scope, or providing special constructs such as the "never-null" pointer of Cyclone. However, programmers rarely keep these annotations up-to-date. The solution is to provide some form of pointer analysis, which derives useful information about pointer variables in the program. An appropriate pointer analysis equips the static tool so that it is capable of reporting more errors without risking too many false alarms. This dissertation proposes a methodology for pointer analysis that is specially tailored for "modular bug finding." It presents a new analysis space for pointer analysis, defined by finer-grain "dimensions of precision," which allows us to explore and evaluate a variety of different algorithms to achieve better trade-offs between analysis precision and efficiency. This framework is developed around a new abstraction for computing points-to sets, the Assign-Fetch Graph, that has many interesting features. Empirical evaluation shows promising results, as some unknown errors in well-known applications were discovered
Implementing sparse flow-sensitive Andersen analysis
Andersen's analysis is the most influential pointer analysis known so far. This paper, which contains parts of the author's upcoming PhD thesis, for the first time presents a flow-sensitive version of that analysis. We prove that the flow-sensitive version still has the same cubic complexity. Thus, the higher precision comes without loss of asymptotic scalability. This contradicts common wisdom of flow-sensitivity being substantially more expensive.
Compared to other flow-sensitive pointer analyses, we have no expensive data-flow problem on the CFG. Instead, we simply propagate pointer targets along data-flow relations which we determine during the analysis. Our analysis in fact combines the computation of the interprocedural SSA data-flow representation and the uncovering of pointer targets. It also integrates the computation of control-flow relations. The analysis thus presents a new, sparse approach for the flow-sensitive solution of the central problems for data-flow based program analyses.
This paper also presents two extensions for higher precision. The first extension shows how the analysis can detect strong updates without increasing the complexity. The second extension describes a context-sensitive version which excludes unrealizable paths. Together this yields the first analysis of that precision which only has a complexity of n^4. This is a substantial improvement over the previous n^6 bound found by Landi.
Thus, in summary this report describes several theoretical advances in the field of flow-sensitive pointer analysis. It also provides details on the algorithms used for incremental SSA construction and context-sensitive pointer propagation
- …