9 research outputs found

    Interprocedural Data Flow Analysis in Soot using Value Contexts

    Full text link
    An interprocedural analysis is precise if it is flow sensitive and fully context-sensitive even in the presence of recursion. Many methods of interprocedural analysis sacrifice precision for scalability while some are precise but limited to only a certain class of problems. Soot currently supports interprocedural analysis of Java programs using graph reachability. However, this approach is restricted to IFDS/IDE problems, and is not suitable for general data flow frameworks such as heap reference analysis and points-to analysis which have non-distributive flow functions. We describe a general-purpose interprocedural analysis framework for Soot using data flow values for context-sensitivity. This framework is not restricted to problems with distributive flow functions, although the lattice must be finite. It combines the key ideas of the tabulation method of the functional approach and the technique of value-based termination of call string construction. The efficiency and precision of interprocedural analyses is heavily affected by the precision of the underlying call graph. This is especially important for object-oriented languages like Java where virtual method invocations cause an explosion of spurious call edges if the call graph is constructed naively. We have instantiated our framework with a flow and context-sensitive points-to analysis in Soot, which enables the construction of call graphs that are far more precise than those constructed by Soot's SPARK engine.Comment: SOAP 2013 Final Versio

    API Usage Verification Through Dataflow Analysis

    Get PDF
    Using APIs in a program is often difficult because of the incomplete documentation and the shortage of available examples. To cope with that, we have seen the increase of API checking tools that provide efficient suggestions for API usage. However, most of those checking tools use a pattern-based analysis to determine errors such as misuse of API calls. In this thesis, we introduce a different analysis technique that relies on explicit API state transitions for the analysis of the program. We adopt a static dataflow analysis framework from SOOT to inspect state transitions at each program point

    Heap Abstractions for Static Analysis

    Full text link
    Heap data is potentially unbounded and seemingly arbitrary. As a consequence, unlike stack and static memory, heap memory cannot be abstracted directly in terms of a fixed set of source variable names appearing in the program being analysed. This makes it an interesting topic of study and there is an abundance of literature employing heap abstractions. Although most studies have addressed similar concerns, their formulations and formalisms often seem dissimilar and some times even unrelated. Thus, the insights gained in one description of heap abstraction may not directly carry over to some other description. This survey is a result of our quest for a unifying theme in the existing descriptions of heap abstractions. In particular, our interest lies in the abstractions and not in the algorithms that construct them. In our search of a unified theme, we view a heap abstraction as consisting of two features: a heap model to represent the heap memory and a summarization technique for bounding the heap representation. We classify the models as storeless, store based, and hybrid. We describe various summarization techniques based on k-limiting, allocation sites, patterns, variables, other generic instrumentation predicates, and higher-order logics. This approach allows us to compare the insights of a large number of seemingly dissimilar heap abstractions and also paves way for creating new abstractions by mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure

    Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers

    Full text link
    Flow- and context-sensitive points-to analysis is difficult to scale; for top-down approaches, the problem centers on repeated analysis of the same procedure; for bottom-up approaches, the abstractions used to represent procedure summaries have not scaled while preserving precision. We propose a novel abstraction called the Generalized Points-to Graph (GPG) which views points-to relations as memory updates and generalizes them using the counts of indirection levels leaving the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottom-up procedure summaries in terms of memory updates and control flow between them. Their compactness is ensured by the following optimizations: strength reduction reduces the indirection levels, redundancy elimination removes redundant memory updates and minimizes control flow (without over-approximating data dependence between memory updates), and call inlining enhances the opportunities of these optimizations. We devise novel operations and data flow analyses for these optimizations. Our quest for scalability of points-to analysis leads to the following insight: The real killer of scalability in program analysis is not the amount of data but the amount of control flow that it may be subjected to in search of precision. The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing precision (i.e., by preserving data dependence without over-approximation). This is the reason why the GPGs are very small even for main procedures that contain the effect of the entire program. This allows our implementation to scale to 158kLoC for C programs

    Flow- and context-sensitive points-to analysis using generalized points-to graphs

    Get PDF
    © Springer-Verlag GmbH Germany 2016. Bottom-up interprocedural methods of program analysis construct summary flow functions for procedures to capture the effect of their calls and have been used effectively for many analyses. However, these methods seem computationally expensive for flow- and context- sensitive points-to analysis (FCPA) which requires modelling unknown locations accessed indirectly through pointers. Such accesses are com- monly handled by using placeholders to explicate unknown locations or by using multiple call-specific summary flow functions. We generalize the concept of points-to relations by using the counts of indirection levels leaving the unknown locations implicit. This allows us to create sum- mary flow functions in the form of generalized points-to graphs (GPGs) without the need of placeholders. By design, GPGs represent both mem- ory (in terms of classical points-to facts) and memory transformers (in terms of generalized points-to facts). We perform FCPA by progressively reducing generalized points-to facts to classical points-to facts. GPGs distinguish between may and must pointer updates thereby facilitating strong updates within calling contexts. The size of GPGs is linearly bounded by the number of variables and is independent of the number of statements. Empirical measurements on SPEC benchmarks show that GPGs are indeed compact in spite of large procedure sizes. This allows us to scale FCPA to 158 kLoC using GPGs (compared to 35 kLoC reported by liveness-based FCPA). Thus GPGs hold a promise of efficiency and scalability for FCPA without compro- mising precision