Search CORE

107 research outputs found

Recommended from our members

A Unified Model for Context-Sensitive Program Analyses: The Blind Men and the Elephant

Author: Jaiswal Swati
Khedker Uday P
Mycroft Alan
Publication venue: ACM COMPUTING SURVEYS
Publication date: 01/07/2021
Field of study

Context-sensitive methods of program analysis increase the precision of interprocedural analysis by achieving the effect of call inlining. These methods have been defined using different formalisms and hence appear as algorithms that are very different from each other. Some methods traverse a call graph top-down whereas some others traverse it bottom-up first and then top-down. Some define contexts explicitly whereas some do not. Some of them directly compute data flow values while some first compute summary functions and then use them to compute data flow values. Further, different methods place different kinds of restrictions on the data flow frameworks supported by them. As a consequence, it is difficult to compare the ideas behind these methods in spite of the fact that they solve essentially the same problem. We argue that these incomparable views are similar to those of blind men describing an elephant called context sensitivity, and make it difficult for a non-expert reader to form a coherent picture of context-sensitive data flow analysis. We bring out this whole-elephant view of context sensitivity in program analysis by proposing a unified model of context sensitivity which provides a clean separation between computation of contexts and computation of data flow values. Our model captures the essence of context sensitivity and defines simple soundness and precision criteria for context-sensitive methods. It facilitates declarative specifications of context-sensitive methods, insightful comparisons between them, and reasoning about their soundness and precision. We demonstrate this by instantiating our model to many known context-sensitive methods

Apollo (Cambridge)

Parameterized Algorithms for Scalable Interprocedural Data-flow Analysis

Author: Zaher Ahmed Khaled
Publication venue
Publication date: 20/09/2023
Field of study

Data-flow analysis is a general technique used to compute information of interest at different points of a program and is considered to be a cornerstone of static analysis. In this thesis, we consider interprocedural data-flow analysis as formalized by the standard IFDS framework, which can express many widely-used static analyses such as reaching definitions, live variables, and null-pointer. We focus on the well-studied on-demand setting in which queries arrive one-by-one in a stream and each query should be answered as fast as possible. While the classical IFDS algorithm provides a polynomial-time solution to this problem, it is not scalable in practice. Specifically, it either requires a quadratic-time preprocessing phase or takes linear time per query, both of which are untenable for modern huge codebases with hundreds of thousands of lines. Previous works have already shown that parameterizing the problem by the treewidth of the program's control-flow graph is promising and can lead to significant gains in efficiency. Unfortunately, these results were only applicable to the limited special case of same-context queries. In this work, we obtain significant speedups for the general case of on-demand IFDS with queries that are not necessarily same-context. This is achieved by exploiting a new graph sparsity parameter, namely the treedepth of the program's call graph. Our approach is the first to exploit the sparsity of control-flow graphs and call graphs at the same time and parameterize by both treewidth and treedepth. We obtain an algorithm with a linear preprocessing phase that can answer each query in constant time with respect to the input size. Finally, we show experimental results demonstrating that our approach significantly outperforms the classical IFDS and its on-demand variant

arXiv.org e-Print Archive

Faster Algorithms for Weighted Recursive State Machines

Author: A Bouajjani
A Lal
A Lal
A Lal
A Lal
E Liberty
G Myers
G Ramalingam
J Knoop
J Knoop
M Sagiv
Q Yu
R Alur
R Alur
R Giegerich
S Horwitz
S Qadeer
T Ball
T Ball
T Reps
TW Reps
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Pushdown systems (PDSs) and recursive state machines (RSMs), which are linearly equivalent, are standard models for interprocedural analysis. Yet RSMs are more convenient as they (a) explicitly model function calls and returns, and (b) specify many natural parameters for algorithmic analysis, e.g., the number of entries and exits. We consider a general framework where RSM transitions are labeled from a semiring and path properties are algebraic with semiring operations, which can model, e.g., interprocedural reachability and dataflow analysis problems. Our main contributions are new algorithms for several fundamental problems. As compared to a direct translation of RSMs to PDSs and the best-known existing bounds of PDSs, our analysis algorithm improves the complexity for finite-height semirings (that subsumes reachability and standard dataflow properties). We further consider the problem of extracting distance values from the representation structures computed by our algorithm, and give efficient algorithms that distinguish the complexity of a one-time preprocessing from the complexity of each individual query. Another advantage of our algorithm is that our improvements carry over to the concurrent setting, where we improve the best-known complexity for the context-bounded analysis of concurrent RSMs. Finally, we provide a prototype implementation that gives a significant speed-up on several benchmarks from the SLAM/SDV project

arXiv.org e-Print Archive

Crossref

IST Austria: PubRep (Institute of Science and Technology)

Symbol-Specific Sparsification of Interprocedural Distributive Environment Problems

Author: Bodden Eric
Karakaya Kadiray
Publication venue
Publication date: 26/01/2024
Field of study

Previous work has shown that one can often greatly speed up static analysis by computing data flows not for every edge in the program's control-flow graph but instead only along definition-use chains. This yields a so-called sparse static analysis. Recent work on SparseDroid has shown that specifically taint analysis can be "sparsified" with extraordinary effectiveness because the taint state of one variable does not depend on those of others. This allows one to soundly omit more flow-function computations than in the general case. In this work, we now assess whether this result carries over to the more generic setting of so-called Interprocedural Distributive Environment (IDE) problems. Opposed to taint analysis, IDE comprises distributive problems with large or even infinitely broad domains, such as typestate analysis or linear constant propagation. Specifically, this paper presents Sparse IDE, a framework that realizes sparsification for any static analysis that fits the IDE framework. We implement Sparse IDE in SparseHeros, as an extension to the popular Heros IDE solver, and evaluate its performance on real-world Java libraries by comparing it to the baseline IDE algorithm. To this end, we design, implement and evaluate a linear constant propagation analysis client on top of SparseHeros. Our experiments show that, although IDE analyses can only be sparsified with respect to symbols and not (numeric) values, Sparse IDE can nonetheless yield significantly lower runtimes and often also memory consumptions compared to the original IDE.Comment: To be published in ICSE 202

arXiv.org e-Print Archive

Verification of temporal properties involving multiple interacting objects

Author: Naeem Nomair A.
Publication venue: 'University of Waterloo'
Publication date: 01/01/2013
Field of study

Defects that arise due to violating a prescribed order for executing statements or executing a disallowed sequence of statements can be hard to detect since the sequence is often spread over multiple functions and source code files. In this dissertation, we develop a verification tool which uses a sound and precise static analysis to verify temporal specifications that can involve multiple objects. Statically analyzing properties that involve multiple objects requires two separate abstractions; one that abstracts the objects in the program and the second which abstracts the state of a group of objects. We present two such abstractions. Objects are abstracted using a storeless heap abstraction. This provides flow-sensitive tracking of individual objects along control flow paths and precise may-alias information. The state abstraction leverages the object abstraction to abstract the state of a group of related objects. We use the IFDS algorithm to implement an analysis that computes the object and state abstractions. Since the original IFDS algorithm is not directly suitable for domains involving objects and pointers, we present four extensions to the original IFDS algorithm. We also present results of an empirical study to measure the precision of the analysis. The performance of the analysis is improved through the use of two types of method summaries. Callee summaries guarantee that using the summary instead of flow-sensitive analysis of the callee does not degrade the precision of the abstraction at the callsite for the callee. For further performance gains, caller summaries that make conservative assumptions for aliasing between parameters of a function call are used. We present results from empirically evaluating the use of these summaries for the object analysis. Finally, to make the analysis practical for use in the development life cycle, we present a verification tool to configure the analysis and visualize the results. The tool provides a number of configuration options to run the analysis. The analysis results are presented in a list displaying statements flagged as possible violations of a property and, for each violation, the sequence of events (statements) that lead to this violation

University of Waterloo's Institutional Repository

Boomerang: Demand-Driven Flow- and Context-Sensitive Pointer Analysis for Java

Author: Ali Karim
Bodden Eric
Nguyen Quang Do Lisa
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th European Conference on Object-Oriented Programming (ECOOP 2016)
Publication date: 01/01/2016
Field of study

Many current program analyses require highly precise pointer information about small, tar- geted parts of a given program. This motivates the need for demand-driven pointer analyses that compute information only where required. Pointer analyses generally compute points-to sets of program variables or answer boolean alias queries. However, many client analyses require richer pointer information. For example, taint and typestate analyses often need to know the set of all aliases of a given variable under a certain calling context. With most current pointer analyses, clients must compute such information through repeated points-to or alias queries, increasing complexity and computation time for them. This paper presents Boomerang, a demand-driven, flow-, field-, and context-sensitive pointer analysis for Java programs. Boomerang computes rich results that include both the possible allocation sites of a given pointer (points-to information) and all pointers that can point to those allocation sites (alias information). For increased precision and scalability, clients can query Boomerang with respect to particular calling contexts of interest. Our experiments show that Boomerang is more precise than existing demand-driven pointer analyses. Additionally, using Boomerang, the taint analysis FlowDroid issues up to 29.4x fewer pointer queries compared to using other pointer analyses that return simpler pointer infor- mation. Furthermore, the search space of Boomerang can be significantly reduced by requesting calling contexts from the client analysis

TUbiblio

Fraunhofer-ePrints

Dagstuhl Research Online Publication Server

Lossless, Persisted Summarization of Static Callgraph, Points-To and Data-Flow Analysis

Author: Bodden Eric
Hermann Ben
Schubert Philipp Dominik
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th European Conference on Object-Oriented Programming (ECOOP 2021)
Publication date: 01/01/2021
Field of study

Static analysis is used to automatically detect bugs and security breaches, and aids compiler optimization. Whole-program analysis (WPA) can yield high precision, however causes long analysis times and thus does not match common software-development workflows, making it often impractical to use for large, real-world applications. This paper thus presents the design and implementation of ModAlyzer, a novel static-analysis approach that aims at accelerating whole-program analysis by making the analysis modular and compositional. It shows how to compute lossless, persisted summaries for callgraph, points-to and data-flow information, and it reports under which circumstances this function-level compositional analysis outperforms WPA. We implemented ModAlyzer as an extension to LLVM and PhASAR, and applied it to 12 real-world C and C++ applications. At analysis time, ModAlyzer modularly and losslessly summarizes the analysis effect of the library code those applications share, hence avoiding its repeated re-analysis. The experimental results show that the reuse of these summaries can save, on average, 72% of analysis time over WPA. Moreover, because it is lossless, the module-wise analysis fully retains precision and recall. Surprisingly, as our results show, it sometimes even yields precision superior to WPA. The initial summary generation, on average, takes about 3.67 times as long as WPA

Dagstuhl Research Online Publication Server