Search CORE

238 research outputs found

Efficient Computation of Parameterized Pointer Information for Interprocedural Analyses

Author: Harrold Mary Jean
Liang Donglin
Publication venue: Georgia Institute of Technology
Publication date: 01/01/2000
Field of study

Pointer information that is provided by many algorithms identifies a memory location using the same name throughout a program. Such pointer information is inappropriate for use in analyzing C programs because, using such information, a program analysis may propagate a large amount of spurious information across procedure boundaries. This paper presents a modular algorithm that efficiently computes parameterized pointer information in which symbolic names are introduced to identify memory locations whose addresses may be passed into a procedure. Because a symbolic name may identify different memory locations when the procedure is invoked under different callsites, using parameterized pointer information can help a program analysis reduce the spurious information that is propagated across procedure boundaries. The paper also presents a set of empirical studies, that demonstrate (a) the efficiency of the algorithm, and (b) the benefits of using parameterized pointer information over using non-parameterized pointer information in program analyses. The studies show that using parameterized pointer information may significantly improve the precision and the efficiency of many program analyses

Structural Analysis: Shape Information via Points-To Computation

Author: Marron Mark
Publication venue
Publication date: 01/01/2012
Field of study

This paper introduces a new hybrid memory analysis, Structural Analysis, which combines an expressive shape analysis style abstract domain with efficient and simple points-to style transfer functions. Using data from empirical studies on the runtime heap structures and the programmatic idioms used in modern object-oriented languages we construct a heap analysis with the following characteristics: (1) it can express a rich set of structural, shape, and sharing properties which are not provided by a classic points-to analysis and that are useful for optimization and error detection applications (2) it uses efficient, weakly-updating, set-based transfer functions which enable the analysis to be more robust and scalable than a shape analysis and (3) it can be used as the basis for a scalable interprocedural analysis that produces precise results in practice. The analysis has been implemented for .Net bytecode and using this implementation we evaluate both the runtime cost and the precision of the results on a number of well known benchmarks and real world programs. Our experimental evaluations show that the domain defined in this paper is capable of precisely expressing the majority of the connectivity, shape, and sharing properties that occur in practice and, despite the use of weak updates, the static analysis is able to precisely approximate the ideal results. The analysis is capable of analyzing large real-world programs (over 30K bytecodes) in less than 65 seconds and using less than 130MB of memory. In summary this work presents a new type of memory analysis that advances the state of the art with respect to expressive power, precision, and scalability and represents a new area of study on the relationships between and combination of concepts from shape and points-to analyses

arXiv.org e-Print Archive

CiteSeerX

Parameterized Algorithms for Scalable Interprocedural Data-flow Analysis

Author: Zaher Ahmed Khaled
Publication venue
Publication date: 20/09/2023
Field of study

Data-flow analysis is a general technique used to compute information of interest at different points of a program and is considered to be a cornerstone of static analysis. In this thesis, we consider interprocedural data-flow analysis as formalized by the standard IFDS framework, which can express many widely-used static analyses such as reaching definitions, live variables, and null-pointer. We focus on the well-studied on-demand setting in which queries arrive one-by-one in a stream and each query should be answered as fast as possible. While the classical IFDS algorithm provides a polynomial-time solution to this problem, it is not scalable in practice. Specifically, it either requires a quadratic-time preprocessing phase or takes linear time per query, both of which are untenable for modern huge codebases with hundreds of thousands of lines. Previous works have already shown that parameterizing the problem by the treewidth of the program's control-flow graph is promising and can lead to significant gains in efficiency. Unfortunately, these results were only applicable to the limited special case of same-context queries. In this work, we obtain significant speedups for the general case of on-demand IFDS with queries that are not necessarily same-context. This is achieved by exploiting a new graph sparsity parameter, namely the treedepth of the program's call graph. Our approach is the first to exploit the sparsity of control-flow graphs and call graphs at the same time and parameterize by both treewidth and treedepth. We obtain an algorithm with a linear preprocessing phase that can answer each query in constant time with respect to the input size. Finally, we show experimental results demonstrating that our approach significantly outperforms the classical IFDS and its on-demand variant

arXiv.org e-Print Archive

Automatically Finding Bugs in Open Source Programs

Author: Nesov Vladimir
Publication venue: European Association of Software Science and Technology
Publication date: 26/11/2009
Field of study

We consider properties desirable for static analysis tools targeted at finding bugs in the real open source code, and review tools based on various approaches to defect detection. A static analysis tool is described, that includes a framework for flow-sensitive interprocedural dataflow analysis and scales to analysis of large programs. The framework enables implementation of multiple checkers searching for specific bugs, such as null pointer dereference and buffer overflow, abstracting from the checkers details such as alias analysis

Expression-based aliasing for OO-languages

Author: Caltais Georgiana
Publication venue
Publication date: 20/10/2014
Field of study

Alias analysis has been an interesting research topic in verification and optimization of programs. The undecidability of determining whether two expressions in a program may reference to the same object is the main source of the challenges raised in alias analysis. In this paper we propose an extension of a previously introduced alias calculus based on program expressions, to the setting of unbounded program executions s.a. infinite loops and recursive calls. Moreover, we devise a corresponding executable specification in the K-framework. An important property of our extension is that, in a non-concurrent setting, the corresponding alias expressions can be over-approximated in terms of a notion of regular expressions. This further enables us to show that the associated K-machinery implements an algorithm that always stops and provides a sound over-approximation of the "may aliasing" information, where soundness stands for the lack of false negatives. As a case study, we analyze the integration and further applications of the alias calculus in SCOOP. The latter is an object-oriented programming model for concurrency, recently formalized in Maude; K-definitions can be compiled into Maude for execution

arXiv.org e-Print Archive

Heap Abstractions for Static Analysis

Author: Kanvar Vini
Khedker Uday P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/05/2015
Field of study

Heap data is potentially unbounded and seemingly arbitrary. As a consequence, unlike stack and static memory, heap memory cannot be abstracted directly in terms of a fixed set of source variable names appearing in the program being analysed. This makes it an interesting topic of study and there is an abundance of literature employing heap abstractions. Although most studies have addressed similar concerns, their formulations and formalisms often seem dissimilar and some times even unrelated. Thus, the insights gained in one description of heap abstraction may not directly carry over to some other description. This survey is a result of our quest for a unifying theme in the existing descriptions of heap abstractions. In particular, our interest lies in the abstractions and not in the algorithms that construct them. In our search of a unified theme, we view a heap abstraction as consisting of two features: a heap model to represent the heap memory and a summarization technique for bounding the heap representation. We classify the models as storeless, store based, and hybrid. We describe various summarization techniques based on k-limiting, allocation sites, patterns, variables, other generic instrumentation predicates, and higher-order logics. This approach allows us to compare the insights of a large number of seemingly dissimilar heap abstractions and also paves way for creating new abstractions by mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure

arXiv.org e-Print Archive

CiteSeerX

Flow- and context-sensitive points-to analysis using generalized points-to graphs

Author: A Sălcianu
D Beyer
NA Naeem
R Jhala
R Madhavan
S Gulwani
UP Khedker
UP Khedker
Y Feng
Publication venue: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2016
Field of study

© Springer-Verlag GmbH Germany 2016. Bottom-up interprocedural methods of program analysis construct summary flow functions for procedures to capture the effect of their calls and have been used effectively for many analyses. However, these methods seem computationally expensive for flow- and context- sensitive points-to analysis (FCPA) which requires modelling unknown locations accessed indirectly through pointers. Such accesses are com- monly handled by using placeholders to explicate unknown locations or by using multiple call-specific summary flow functions. We generalize the concept of points-to relations by using the counts of indirection levels leaving the unknown locations implicit. This allows us to create sum- mary flow functions in the form of generalized points-to graphs (GPGs) without the need of placeholders. By design, GPGs represent both mem- ory (in terms of classical points-to facts) and memory transformers (in terms of generalized points-to facts). We perform FCPA by progressively reducing generalized points-to facts to classical points-to facts. GPGs distinguish between may and must pointer updates thereby facilitating strong updates within calling contexts. The size of GPGs is linearly bounded by the number of variables and is independent of the number of statements. Empirical measurements on SPEC benchmarks show that GPGs are indeed compact in spite of large procedure sizes. This allows us to scale FCPA to 158 kLoC using GPGs (compared to 35 kLoC reported by liveness-based FCPA). Thus GPGs hold a promise of efficiency and scalability for FCPA without compro- mising precision

arXiv.org e-Print Archive