Search CORE

21 research outputs found

Interprocedural Data Flow Analysis in Soot using Value Contexts

Author: Khedker Uday P.
Padhye Rohan
Publication venue
Publication date: 01/01/2013
Field of study

An interprocedural analysis is precise if it is flow sensitive and fully context-sensitive even in the presence of recursion. Many methods of interprocedural analysis sacrifice precision for scalability while some are precise but limited to only a certain class of problems. Soot currently supports interprocedural analysis of Java programs using graph reachability. However, this approach is restricted to IFDS/IDE problems, and is not suitable for general data flow frameworks such as heap reference analysis and points-to analysis which have non-distributive flow functions. We describe a general-purpose interprocedural analysis framework for Soot using data flow values for context-sensitivity. This framework is not restricted to problems with distributive flow functions, although the lattice must be finite. It combines the key ideas of the tabulation method of the functional approach and the technique of value-based termination of call string construction. The efficiency and precision of interprocedural analyses is heavily affected by the precision of the underlying call graph. This is especially important for object-oriented languages like Java where virtual method invocations cause an explosion of spurious call edges if the call graph is constructed naively. We have instantiated our framework with a flow and context-sensitive points-to analysis in Soot, which enables the construction of call graphs that are far more precise than those constructed by Soot's SPARK engine.Comment: SOAP 2013 Final Versio

arXiv.org e-Print Archive

Crossref

Heap Abstractions for Static Analysis

Author: Kanvar Vini
Khedker Uday P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/05/2015
Field of study

Heap data is potentially unbounded and seemingly arbitrary. As a consequence, unlike stack and static memory, heap memory cannot be abstracted directly in terms of a fixed set of source variable names appearing in the program being analysed. This makes it an interesting topic of study and there is an abundance of literature employing heap abstractions. Although most studies have addressed similar concerns, their formulations and formalisms often seem dissimilar and some times even unrelated. Thus, the insights gained in one description of heap abstraction may not directly carry over to some other description. This survey is a result of our quest for a unifying theme in the existing descriptions of heap abstractions. In particular, our interest lies in the abstractions and not in the algorithms that construct them. In our search of a unified theme, we view a heap abstraction as consisting of two features: a heap model to represent the heap memory and a summarization technique for bounding the heap representation. We classify the models as storeless, store based, and hybrid. We describe various summarization techniques based on k-limiting, allocation sites, patterns, variables, other generic instrumentation predicates, and higher-order logics. This approach allows us to compare the insights of a large number of seemingly dissimilar heap abstractions and also paves way for creating new abstractions by mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure

arXiv.org e-Print Archive

CiteSeerX

Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers

Author: Gharat Pritam M.
Khedker Uday P.
Mycroft Alan
Publication venue
Publication date: 28/01/2018
Field of study

Flow- and context-sensitive points-to analysis is difficult to scale; for top-down approaches, the problem centers on repeated analysis of the same procedure; for bottom-up approaches, the abstractions used to represent procedure summaries have not scaled while preserving precision. We propose a novel abstraction called the Generalized Points-to Graph (GPG) which views points-to relations as memory updates and generalizes them using the counts of indirection levels leaving the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottom-up procedure summaries in terms of memory updates and control flow between them. Their compactness is ensured by the following optimizations: strength reduction reduces the indirection levels, redundancy elimination removes redundant memory updates and minimizes control flow (without over-approximating data dependence between memory updates), and call inlining enhances the opportunities of these optimizations. We devise novel operations and data flow analyses for these optimizations. Our quest for scalability of points-to analysis leads to the following insight: The real killer of scalability in program analysis is not the amount of data but the amount of control flow that it may be subjected to in search of precision. The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing precision (i.e., by preserving data dependence without over-approximation). This is the reason why the GPGs are very small even for main procedures that contain the effect of the entire program. This allows our implementation to scale to 158kLoC for C programs

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Recommended from our members

A Unified Model for Context-Sensitive Program Analyses: The Blind Men and the Elephant

Author: Jaiswal Swati
Khedker Uday P
Mycroft Alan
Publication venue: ACM COMPUTING SURVEYS
Publication date: 01/07/2021
Field of study

Context-sensitive methods of program analysis increase the precision of interprocedural analysis by achieving the effect of call inlining. These methods have been defined using different formalisms and hence appear as algorithms that are very different from each other. Some methods traverse a call graph top-down whereas some others traverse it bottom-up first and then top-down. Some define contexts explicitly whereas some do not. Some of them directly compute data flow values while some first compute summary functions and then use them to compute data flow values. Further, different methods place different kinds of restrictions on the data flow frameworks supported by them. As a consequence, it is difficult to compare the ideas behind these methods in spite of the fact that they solve essentially the same problem. We argue that these incomparable views are similar to those of blind men describing an elephant called context sensitivity, and make it difficult for a non-expert reader to form a coherent picture of context-sensitive data flow analysis. We bring out this whole-elephant view of context sensitivity in program analysis by proposing a unified model of context sensitivity which provides a clean separation between computation of contexts and computation of data flow values. Our model captures the essence of context sensitivity and defines simple soundness and precision criteria for context-sensitive methods. It facilitates declarative specifications of context-sensitive methods, insightful comparisons between them, and reasoning about their soundness and precision. We demonstrate this by instantiating our model to many known context-sensitive methods

Apollo (Cambridge)

Heap Reference Analysis Using Access Graphs

Author: Amey Karkare
Amitabha Sanyal
Gadbois D.
Karkare A.
Khedker U. P.
Reid A.
Shaham R.
Uday P. Khedker
Vallée-Rai R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2007
Field of study

Despite significant progress in the theory and practice of program analysis, analysing properties of heap data has not reached the same level of maturity as the analysis of static and stack data. The spatial and temporal structure of stack and static data is well understood while that of heap data seems arbitrary and is unbounded. We devise bounded representations which summarize properties of the heap data. This summarization is based on the structure of the program which manipulates the heap. The resulting summary representations are certain kinds of graphs called access graphs. The boundedness of these representations and the monotonicity of the operations to manipulate them make it possible to compute them through data flow analysis. An important application which benefits from heap reference analysis is garbage collection, where currently liveness is conservatively approximated by reachability from program variables. As a consequence, current garbage collectors leave a lot of garbage uncollected, a fact which has been confirmed by several empirical studies. We propose the first ever end-to-end static analysis to distinguish live objects from reachable objects. We use this information to make dead objects unreachable by modifying the program. This application is interesting because it requires discovering data flow information representing complex semantics. In particular, we discover four properties of heap data: liveness, aliasing, availability, and anticipability. Together, they cover all combinations of directions of analysis (i.e. forward and backward) and confluence of information (i.e. union and intersection). Our analysis can also be used for plugging memory leaks in C/C++ languages.Comment: Accepted for printing by ACM TOPLAS. This version incorporates referees' comment

arXiv.org e-Print Archive

Crossref

An improved bound for call strings based interprocedural analysis of bit vector frameworks

Author: Bageshri Karkare
Khedker U. P.
Uday P. Khedker
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref