118,172 research outputs found

    Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers

    Full text link
    Flow- and context-sensitive points-to analysis is difficult to scale; for top-down approaches, the problem centers on repeated analysis of the same procedure; for bottom-up approaches, the abstractions used to represent procedure summaries have not scaled while preserving precision. We propose a novel abstraction called the Generalized Points-to Graph (GPG) which views points-to relations as memory updates and generalizes them using the counts of indirection levels leaving the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottom-up procedure summaries in terms of memory updates and control flow between them. Their compactness is ensured by the following optimizations: strength reduction reduces the indirection levels, redundancy elimination removes redundant memory updates and minimizes control flow (without over-approximating data dependence between memory updates), and call inlining enhances the opportunities of these optimizations. We devise novel operations and data flow analyses for these optimizations. Our quest for scalability of points-to analysis leads to the following insight: The real killer of scalability in program analysis is not the amount of data but the amount of control flow that it may be subjected to in search of precision. The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing precision (i.e., by preserving data dependence without over-approximation). This is the reason why the GPGs are very small even for main procedures that contain the effect of the entire program. This allows our implementation to scale to 158kLoC for C programs

    BCFA: Bespoke Control Flow Analysis for CFA at Scale

    Full text link
    Many data-driven software engineering tasks such as discovering programming patterns, mining API specifications, etc., perform source code analysis over control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be expensive and performance of the analysis heavily depends on the underlying CFG traversal strategy. State-of-the-art analysis frameworks use a fixed traversal strategy. We argue that a single traversal strategy does not fit all kinds of analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a control flow analysis (CFA) and a large number of CFGs, BCFA selects the most efficient traversal strategy for each CFG. BCFA extracts a set of properties of the CFA by analyzing the code of the CFA and combines it with properties of the CFG, such as branching factor and cyclicity, for selecting the optimal traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a set of representative static analyses that mainly involve traversing CFGs and two large datasets containing 287 thousand and 162 million CFGs. Our results show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.Comment: 12 page

    Inferring Types to Eliminate Ownership Checks in an Intentional JavaScript Compiler

    Get PDF
    Concurrent programs are notoriously difficult to develop due to the non-deterministic nature of thread scheduling. It is desirable to have a programming language to make such development easier. Tscript comprises such a system. Tscript is an extension of JavaScript that provides multithreading support along with intent specification. These intents allow a programmer to specify how parts of the program interact in a multithreaded context. However, enforcing intents requires run-time memory checks which can be inefficient. This thesis implements an optimization in the Tscript compiler that seeks to improve this inefficiency through static analysis. Our approach utilizes both type inference and dataflow analysis to eliminate unnecessary run-time checks

    Variability Abstractions: Trading Precision for Speed in Family-Based Analyses (Extended Version)

    Full text link
    Family-based (lifted) data-flow analysis for Software Product Lines (SPLs) is capable of analyzing all valid products (variants) without generating any of them explicitly. It takes as input only the common code base, which encodes all variants of a SPL, and produces analysis results corresponding to all variants. However, the computational cost of the lifted analysis still depends inherently on the number of variants (which is exponential in the number of features, in the worst case). For a large number of features, the lifted analysis may be too costly or even infeasible. In this paper, we introduce variability abstractions defined as Galois connections and use abstract interpretation as a formal method for the calculational-based derivation of approximate (abstracted) lifted analyses of SPL programs, which are sound by construction. Moreover, given an abstraction we define a syntactic transformation that translates any SPL program into an abstracted version of it, such that the analysis of the abstracted SPL coincides with the corresponding abstracted analysis of the original SPL. We implement the transformation in a tool, reconfigurator that works on Object-Oriented Java program families, and evaluate the practicality of this approach on three Java SPL benchmarks.Comment: 50 pages, 10 figure

    A Simple and Scalable Static Analysis for Bound Analysis and Amortized Complexity Analysis

    Full text link
    We present the first scalable bound analysis that achieves amortized complexity analysis. In contrast to earlier work, our bound analysis is not based on general purpose reasoners such as abstract interpreters, software model checkers or computer algebra tools. Rather, we derive bounds directly from abstract program models, which we obtain from programs by comparatively simple invariant generation and symbolic execution techniques. As a result, we obtain an analysis that is more predictable and more scalable than earlier approaches. Our experiments demonstrate that our analysis is fast and at the same time able to compute bounds for challenging loops in a large real-world benchmark. Technically, our approach is based on lossy vector addition systems (VASS). Our bound analysis first computes a lexicographic ranking function that proves the termination of a VASS, and then derives a bound from this ranking function. Our methodology achieves amortized analysis based on a new insight how lexicographic ranking functions can be used for bound analysis

    Scaling Bounded Model Checking By Transforming Programs With Arrays

    Full text link
    Bounded Model Checking is one the most successful techniques for finding bugs in program. However, model checkers are resource hungry and are often unable to verify programs with loops iterating over large arrays.We present a transformation that enables bounded model checkers to verify a certain class of array properties. Our technique transforms an array-manipulating (ANSI-C) program to an array-free and loop-free (ANSI-C) program thereby reducing the resource requirements of a model checker significantly. Model checking of the transformed program using an off-the-shelf bounded model checker simulates the loop iterations efficiently. Thus, our transformed program is a sound abstraction of the original program and is also precise in a large number of cases - we formally characterize the class of programs for which it is guaranteed to be precise. We demonstrate the applicability and usefulness of our technique on both industry code as well as academic benchmarks
    • …
    corecore