497 research outputs found

    Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers

    Full text link
    Flow- and context-sensitive points-to analysis is difficult to scale; for top-down approaches, the problem centers on repeated analysis of the same procedure; for bottom-up approaches, the abstractions used to represent procedure summaries have not scaled while preserving precision. We propose a novel abstraction called the Generalized Points-to Graph (GPG) which views points-to relations as memory updates and generalizes them using the counts of indirection levels leaving the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottom-up procedure summaries in terms of memory updates and control flow between them. Their compactness is ensured by the following optimizations: strength reduction reduces the indirection levels, redundancy elimination removes redundant memory updates and minimizes control flow (without over-approximating data dependence between memory updates), and call inlining enhances the opportunities of these optimizations. We devise novel operations and data flow analyses for these optimizations. Our quest for scalability of points-to analysis leads to the following insight: The real killer of scalability in program analysis is not the amount of data but the amount of control flow that it may be subjected to in search of precision. The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing precision (i.e., by preserving data dependence without over-approximation). This is the reason why the GPGs are very small even for main procedures that contain the effect of the entire program. This allows our implementation to scale to 158kLoC for C programs

    BCFA: Bespoke Control Flow Analysis for CFA at Scale

    Full text link
    Many data-driven software engineering tasks such as discovering programming patterns, mining API specifications, etc., perform source code analysis over control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be expensive and performance of the analysis heavily depends on the underlying CFG traversal strategy. State-of-the-art analysis frameworks use a fixed traversal strategy. We argue that a single traversal strategy does not fit all kinds of analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a control flow analysis (CFA) and a large number of CFGs, BCFA selects the most efficient traversal strategy for each CFG. BCFA extracts a set of properties of the CFA by analyzing the code of the CFA and combines it with properties of the CFG, such as branching factor and cyclicity, for selecting the optimal traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a set of representative static analyses that mainly involve traversing CFGs and two large datasets containing 287 thousand and 162 million CFGs. Our results show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.Comment: 12 page

    Value-Flow-Based Demand-Driven Pointer Analysis for C and C++

    Full text link
    IEEE We present SUPA, a value-flow-based demand-driven flow- and context-sensitive pointer analysis with strong updates for C and C++ programs. SUPA enables computing points-to information via value-flow refinement, in environments with small time and memory budgets. We formulate SUPA by solving a graph-reachability problem on an inter-procedural value-flow graph representing a program's def-use chains, which are pre-computed efficiently but over-approximately. To answer a client query (a request for a variable's points-to set), SUPA reasons about the flow of values along the pre-computed def-use chains sparsely (rather than across all program points), by performing only the work necessary for the query (rather than analyzing the whole program). In particular, strong updates are performed to filter out spurious def-use chains through value-flow refinement as long as the total budget is not exhausted

    Flow-sensitive type-based heap cloning

    Full text link
    By respecting program control-flow, flow-sensitive pointer analysis promises more precise results than its flow-insensitive counterpart. However, existing heap abstractions for C and C++ flow-sensitive pointer analyses model the heap by creating a single abstract heap object for each memory allocation. Two runtime heap objects which originate from the same allocation site are imprecisely modelled using one abstract object, which makes them share the same imprecise points-to sets and thus reduces the benefit of analysing heap objects flow-sensitively. On the other hand, equipping flow-sensitive analysis with context-sensitivity, whereby an abstract heap object would be created (cloned) per calling context, can yield a more precise heap model, but at the cost of uncontrollable analysis overhead when analysing larger programs. This paper presents TypeClone, a new type-based heap model for flow-sensitive analysis. Our key insight is to differentiate concrete heap objects lazily using type information at use sites within the program control-flow (e.g., when accessed via pointer dereferencing) for programs which conform to the strict aliasing rules set out by the C and C++ standards. The novelty of TypeClone lies in its lazy heap cloning: an untyped abstract heap object created at an allocation site is killed and replaced with a new object (i.e. a clone), uniquely identified by the type information at its use site, for flow-sensitive points-to propagation. Thus, heap cloning can be performed within a flow-sensitive analysis without the need for context-sensitivity. Moreover, TypeClone supports new kinds of strong updates for flow-sensitive analysis where heap objects are filtered out from imprecise points-to relations at object use sites according to the strict aliasing rules. Our method is neither strictly superior nor inferior to context-sensitive heap cloning, but rather, represents a new dimension that achieves a sweet spot between precision and efficiency. We evaluate our analysis by comparing TypeClone with state-of-the-art sparse flow-sensitive points-to analysis using the 12 largest programs in GNU Coreutils. Our experimental results also confirm that TypeClone is more precise than flow-sensitive pointer analysis and is able to, on average, answer over 15% more alias queries with a no-alias result

    Value-Flow-Based Demand-Driven Pointer Analysis for C and C++

    Full text link

    Efficient Flow-Sensitive Pointer Analysis on Full-Sparse Memory SSA

    Full text link
    Pointer analysis is a fundamental research topic in computer science. It statically determines the potential runtime targets of pointers. Many clients benefit from this information, including compiler optimization, bug detection, security analysis and change impact analysis, etc. As a key dimension in pointer analysis, flow-sensitivity improves its precision by considering program execution order. Ideally, flow-sensitive pointer analysis should be performed by analyzing each program path independently. However, even ignoring the branch conditions, such solution remains intractable and extremely expensive for whole program analysis due to potentially unbounded program paths. Finding the right balance between precision and efficiency in flow-sensitive pointer analysis lies at the heart of pointer analysis. In this thesis, we first introduce an efficient inter-procedural full-sparse memory SSA construction algorithm. Then we improve flow-sensitive pointer analysis based on the memory SSA with the following two contributions: First, Selfs, a region-based selective flow-sensitive pointer analysis, is proposed to allow precision and efficiency trade-offs to be made according to region partitioning. By maintaining flow-sensitivity between regions instead of statements, Selfs is faster than the state-of-the-art full-sparse flow-sensitive analysis while achieving the same precision when used for alias queries. Second, Spas explores the intra-procedural path correlations on top of sparse flow-sensitive and context-sensitive pointer analysis. By using binary decision diagrams to represent the compact path conditions, Spas improves the precision of pointer analysis while only introducing a small overhead

    Implementing sparse flow-sensitive Andersen analysis

    Get PDF
    Andersen's analysis is the most influential pointer analysis known so far. This paper, which contains parts of the author's upcoming PhD thesis, for the first time presents a flow-sensitive version of that analysis. We prove that the flow-sensitive version still has the same cubic complexity. Thus, the higher precision comes without loss of asymptotic scalability. This contradicts common wisdom of flow-sensitivity being substantially more expensive. Compared to other flow-sensitive pointer analyses, we have no expensive data-flow problem on the CFG. Instead, we simply propagate pointer targets along data-flow relations which we determine during the analysis. Our analysis in fact combines the computation of the interprocedural SSA data-flow representation and the uncovering of pointer targets. It also integrates the computation of control-flow relations. The analysis thus presents a new, sparse approach for the flow-sensitive solution of the central problems for data-flow based program analyses. This paper also presents two extensions for higher precision. The first extension shows how the analysis can detect strong updates without increasing the complexity. The second extension describes a context-sensitive version which excludes unrealizable paths. Together this yields the first analysis of that precision which only has a complexity of n^4. This is a substantial improvement over the previous n^6 bound found by Landi. Thus, in summary this report describes several theoretical advances in the field of flow-sensitive pointer analysis. It also provides details on the algorithms used for incremental SSA construction and context-sensitive pointer propagation
    • …
    corecore