Search CORE

8 research outputs found

Flow- and context-sensitive points-to analysis using generalized points-to graphs

Author: A Sălcianu
D Beyer
NA Naeem
R Jhala
R Madhavan
S Gulwani
UP Khedker
UP Khedker
Y Feng
Publication venue: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publication date: 01/01/2016
Field of study

© Springer-Verlag GmbH Germany 2016. Bottom-up interprocedural methods of program analysis construct summary flow functions for procedures to capture the effect of their calls and have been used effectively for many analyses. However, these methods seem computationally expensive for flow- and context- sensitive points-to analysis (FCPA) which requires modelling unknown locations accessed indirectly through pointers. Such accesses are com- monly handled by using placeholders to explicate unknown locations or by using multiple call-specific summary flow functions. We generalize the concept of points-to relations by using the counts of indirection levels leaving the unknown locations implicit. This allows us to create sum- mary flow functions in the form of generalized points-to graphs (GPGs) without the need of placeholders. By design, GPGs represent both mem- ory (in terms of classical points-to facts) and memory transformers (in terms of generalized points-to facts). We perform FCPA by progressively reducing generalized points-to facts to classical points-to facts. GPGs distinguish between may and must pointer updates thereby facilitating strong updates within calling contexts. The size of GPGs is linearly bounded by the number of variables and is independent of the number of statements. Empirical measurements on SPEC benchmarks show that GPGs are indeed compact in spite of large procedure sizes. This allows us to scale FCPA to 158 kLoC using GPGs (compared to 35 kLoC reported by liveness-based FCPA). Thus GPGs hold a promise of efficiency and scalability for FCPA without compro- mising precision

arXiv.org e-Print Archive

Crossref

Apollo (Cambridge)

An improved bound for call strings based interprocedural analysis of bit vector frameworks

Author: KARKARE B
KHEDKER UP
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

Interprocedural data flow analysis extends the scope of analysis across procedure boundaries in search of increased optimization opportunities. Call strings based approach is a general approach for performing flow and context sensitive interprocedural analysis. It maintains a history of calls along with the data flow information in the form of call strings, which are sequences of unfinished calls. Recursive programs may need infinite call strings for interprocedural data flow analysis. For bit vector frameworks this method is believed to require all call strings of lengths up to 3K, where K is the maximum number of distinct call sites in any call chain. We combine the nature of information flows in bit-vector data flow analysis with the structure of interprocedurally valid paths to bound the call strings. Instead of bounding the length of call strings, we bound the number of occurrences of any call site in a call string. We show that the call strings in which a call site appears at most three times, are sufficient for convergence on interprocedural maximum fixed point solution. Though this results in the same worst case length of call strings, it does not require constructing all call strings up to length 3K. Our empirical measurements on recursive programs show that our bound reduces the lengths and the number of call strings, and hence the analysis time, significantly

Dspace at IIT Bombay

A GENERALIZED THEORY OF BIT VECTOR DATA-FLOW ANALYSIS

Author: DHAMDHERE DM
KHEDKER UP
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1994
Field of study

The classical theory of data flow analysis, which has its roots in unidirectional flows, is inadequate to characterize bidirectional data flow problems. We present a generalized theory of bit vector data flow analysis which explains the known results in unidirectional and bidirectional data flows and provides a deeper insight into the process of data flow analysis. Based on the theory, we develop a worklist-based generic algorithm which is uniformly applicable to unidirectional and bidirectional data flow problems. It is simple, versatile, and easy to adapt for a specific problem. We show that the theory and the algorithm are applicable to all bounded monotone data flow problems which possess the property of the separability of solution. The theory yields valuable information about the complexity of data flow analysis. We show that the complexity of worklist-based iterative analysis is the same for unidirectional and bidirectional problems. We also define a measure of the complexity of round-robin iterative analysis. This measure, called width, is uniformly applicable to unidirectional and bidirectional problems and provides a tighter bound for unidirectional problems than the traditional measure of depth. Other applications include explanation of isolated results in efficient solution techniques and motivation of new techniques for bidirectional flows. In particular, we discuss edge splitting and edge placement and develop a feasibility criterion for decomposition of a bidirectional flow into a sequence of unidirectional flows

Dspace at IIT Bombay

Heap Abstractions for Static Analysis

Author: KANVAR V
KHEDKER UP
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Heap data is potentially unbounded and seemingly arbitrary. Hence, unlike stack and static data, heap data cannot be abstracted in terms of a fixed set of program variables. This makes it an interesting topic of study and there is an abundance of literature employing heap abstractions. Although most studies have addressed similar concerns, insights gained in one description of heap abstraction may not directly carry over to some other description. In our search of a unified theme, we view heap abstraction as consisting of two steps: (a) heap modelling, which is the process of representing a heap memory (i.e., an unbounded set of concrete locations) as a heap model (i.e., an unbounded set of abstract locations), and (b) summarization, which is the process of bounding the heap model by merging multiple abstract locations into summary locations. We classify the heap models as storeless, store based, and hybrid. We describe various summarization techniques based on k-limiting, allocation sites, patterns, variables, other generic instrumentation predicates, and higher-order logics. This approach allows us to compare the insights of a large number of seemingly dissimilar heap abstractions and also paves the way for creating new abstractions by mix and match of models and summarization techniques

Dspace at IIT Bombay

Efficiency, precision, simplicity, and generality in interprocedural data flow analysis : resurrecting the classical call strings method

Author: KARKARE B
KHEDKER UP
Publication venue: SPRINGER-VERLAG BERLIN
Publication date: 01/01/2008
Field of study

The full call strings method is the most general, simplest, and most precise method of performing context sensitive interprocedural data flow analysis. It remembers contexts using call strings. For full precision, all call strings up to a prescribed length must be constructed. Two limitations of this method are (a) it cannot be used for frameworks with infinite lattices, and (b) the prescribed length is quadratic in the size of the lattice resulting in an impractically large number of call strings. These limitations have resulted in a proliferation of ad hoc methods which compromise on generality, precision, or simplicity. We propose a variant of the classical full call strings method which reduces the number of call strings, and hence the analysis time, by orders of magnitude as corroborated by our empirical measurements. It reduces the worst case call string length from quadratic in the size of the lattice to linear. Further, unlike the classical method, this worst case length need not be reached. Our approach retains the precision, generality, and simplicity of call strings method without imposing any additional constraints. It can accommodate demand-driven approximations and hence can be used for frameworks with infinite lattices

Dspace at IIT Bombay

Validation of GCC optimizers through trace generation

Author: KANADE A
KHEDKER UP
SANYAL A
Publication venue: 'Wiley'
Publication date: 01/01/2009
Field of study

The translation validation approach involves establishing semantics preservation of individual compilations. In this paper, we present a novel framework for translation validation of optimizers. We identify a comprehensive set of primitive program transformations that are commonly used in many optimizations. For each primitive, we define soundness conditions that guarantee. that the transformation is semantics preserving. This framework of transformations and soundness conditions is independent of any particular compiler implementation and is formalized in PVS. An optimizer is instrumented to generate the trace of an optimization run in terms of the predefined transformation-primitives. The validation succeeds if (1) the trace conforms to the optimization and (2) the soundness conditions of the individual transformations in the trace are satisfied. The first step eliminates the need to trust the instrumentation. The soundness conditions are defined in a temporal logic and therefore the second step involves model checking. Thus the scheme is completely automatable. We have applied this approach to several intraprocedural optimizations of RTL intermediate code in GNU Compiler Collection (GCC) v4.1.0, namely, loop invariant code motion, partial redundancy elimination, lazy code motion, code hoisting, and copy and constant propagation for sample programs written in a subset of the C language. The validation does not require information about program analyses performed by GCC. Therefore even though the GCC code base is quite large and complex, instrumentation could be achieved easily. The framework requires an estimated 21 lines of instrumentation code and 140 lines of PVS specifications for every 1000 lines of the GCC code considered for validation

Dspace at IIT Bombay

Recommended from our members

Generalized Points-to Graphs: A Precise and Scalable Abstraction for Points-to Analysis

Author: Gharat PM
Khedker UP
Mycroft A
Publication venue: ACM Transactions on Programming Languages and Systems
Publication date: 01/01/2020
Field of study

Computing precise (fully flow- and context-sensitive) and exhaustive (as against demand-driven) points-to information is known to be expensive. Top-down approaches require repeated analysis of a procedure for separate contexts. Bottom-up approaches need to model unknown pointees accessed indirectly through pointers that may be defined in the callers and hence do not scale while preserving precision. Therefore, most approaches to precise points-to analysis begin with a scalable but imprecise method and then seek to increase its precision. We take the opposite approach in that we begin with a precise method and increase its scalability. In a nutshell, we create naive but possibly non-scalable procedure summaries and then use novel optimizations to compact them while retaining their soundness and precision. For this purpose, we propose a novel abstraction called the generalized points-to graph (GPG), which views points-to relations as memory updates and generalizes them using the counts of indirection levels leaving the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottom-up procedure summaries in terms of memory updates and control flow between them. Their compactness is ensured by strength reduction (which reduces the indirection levels), control flow minimization (which removes control flow edges while preserving soundness and precision), and call inlining (which enhances the opportunities of these optimizations). The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing precision. This is the reason GPGs are very small even for main procedures that contain the effect of the entire program. This allows our implementation to scale to 158 kLoC for C programs. At a more general level, GPGs provide a convenient abstraction to represent and transform memory in the presence of pointers. Future investigations can try to combine it with other abstractions for static analyses that can benefit from points-to information.TCS Research Fellowship (Tata Consultancy Services, tcs.com

Apollo (Cambridge)

Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers

Author: Gharat PM
Khedker UP
Mycroft A
Publication venue
Publication date: 08/10/2018
Field of study

Flow- and context-sensitive points-to analysis is difficult to scale; for top-down approaches, the problem centers on repeated analysis of the same procedure; for bottom-up approaches, the abstractions used to represent procedure summaries have not scaled while preserving precision. We propose a novel abstraction called the Generalized Points-to Graph (GPG) which views points-to relations as memory updates and generalizes them using the counts of indirection levels leaving the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottom-up procedure summaries in terms of memory updates and control flow between them. Their compactness is ensured by the following optimizations: strength reduction reduces the indirection levels, redundancy elimination removes redundant memory updates and minimizes control flow (without over-approximating data dependence between memory updates), and call inlining enhances the opportunities of these optimizations. We devise novel operations and data flow analyses for these optimizations. Our quest for scalability of points-to analysis leads to the following insight: The real killer of scalability in program analysis is not the amount of data but the amount of control flow that it may be subjected to in search of precision. The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing precision (i.e., by preserving data dependence without over-approximation). This is the reason why the GPGs are very small even for main procedures that contain the effect of the entire program. This allows our implementation to scale to 158kLoC for C programs

Spiral - Imperial College Digital Repository